Index of all elements
[ a ]
[ b ]
[ c ]
[ d ]
[ e ]
[ f ]
[ g ]
[ h ]
[ i ]
[ j ]
[ k ]
[ l ]
[ m ]
[ n ]
[ o ]
[ p ]
[ q ]
[ r ]
[ s ]
[ t ]
[ u ]
[ v ]
[ w ]
[ x ]
[ z ]
[ _ ]
a
- $active_tiers
- in file index_dictionary.php, variable IndexDictionary::$active_tiers
Tiers which currently have data for reading
- $activities
- in file admin_controller.php, variable AdminController::$activities
Says which activities (roughly methods invoke from the web) this
- $activities
- in file archive_controller.php, variable ArchiveController::$activities
The only legal activity this controller will accept is a request
- $activities
- in file fetch_controller.php, variable FetchController::$activities
These are the activities supported by this controller
- $activities
- in file resource_controller.php, variable ResourceController::$activities
These are the activities supported by this controller
- $activities
- in file search_controller.php, variable SearchController::$activities
Says which activities (roughly methods invoke from the web) this
- $activities
- in file crawl_controller.php, variable CrawlController::$activities
These are the activities supported by this controller
- $activities
- in file static_controller.php, variable StaticController::$activities
Says which activities (roughly methods invoke from the web)
- $activities
- in file machine_controller.php, variable MachineController::$activities
These are the activities supported by this controller
- $additional_meta_words
- in file phrase_model.php, variable PhraseModel::$additional_meta_words
an associative array of additional meta words and the max description length of results if such a meta word is used this array is typically set in index.php
- $admin
- in file configure_tool.php, variable ConfigureTool::$admin
Used to hold an AdminController object used to manipulate the
- $allowed_sites
- in file queue_server.php, variable QueueServer::$allowed_sites
Web-sites that crawler can crawl. If used, ONLY these will be crawled
- $allowed_sites
- in file fetcher.php, variable Fetcher::$allowed_sites
Web-sites that crawler can crawl. If used, ONLY these will be crawled
- $all_file_types
- in file queue_server.php, variable QueueServer::$all_file_types
List of all known file extensions including those not used for crawl
- $all_file_types
- in file fetcher.php, variable Fetcher::$all_file_types
List of all known file extensions including those not used for crawl
- $archive
- in file web_archive_bundle_iterator.php, variable WebArchiveBundleIterator::$archive
The web archive bundle being iterated over
- $archive_iterator
- in file fetcher.php, variable Fetcher::$archive_iterator
If an web archive crawl (i.e. a re-crawl) is active then this field
- $archive_modified_time
- in file queue_server.php, variable QueueServer::$archive_modified_time
This keeps track of the time the current archive info was last modified This way the queue_server knows if the user has changed the crawl parameters during the crawl.
- $arc_dir
- in file fetcher.php, variable Fetcher::$arc_dir
For a non-web archive crawl, holds the path to the directory that
- $arc_type
- in file fetcher.php, variable Fetcher::$arc_type
For an archive crawl, holds the name of the type of archive being
- $attributes
- in file epub_processor.php, variable EpubProcessor::$attributes
The attribute of the tag element in an xml document
- ActivityElement
- in file activity_element.php, class ActivityElement
This element is used to display the list of available activities in the AdminView
- ActivityModel
- in file activity_model.php, class ActivityModel
This is class is used to handle db results related to Administration Activities
- acuteByGrave
- in file tokenizer.php, method ItStemmer::acuteByGrave()
Replaces all acute accents in a string by grave accents and also handles accented characters
- add
- in file bloom_filter_bundle.php, method BloomFilterBundle::add()
Inserts a $value into the BloomFilterBundle
- add
- in file bloom_filter_file.php, method BloomFilterFile::add()
Inserts the provided item into the Bloomfilter
- add
- in file trie.php, method Trie::add()
Adds a term to the Trie
- addActivityRole
- in file role_model.php, method RoleModel::addActivityRole()
Add an allowed activity to an existing role
- addCacheJavascriptTags
- in file search_controller.php, method SearchController::addCacheJavascriptTags()
Add to supplied node subnodes containing script tags for javascript libraries used to display cache pages
- addContainsRobotTxtFilterTestCase
- in file web_queue_bundle_test.php, method WebQueueBundleTest::addContainsRobotTxtFilterTestCase()
Checks the two methods addGotRobotTxtFilter($host) and
- addCount
- in file web_archive_bundle.php, method WebArchiveBundle::addCount()
Updates the description file with the current count for the number of items in the WebArchiveBundle. If the $field item is used counts of additional properties (visited urls say versus total urls) can be maintained.
- addDNSCache
- in file web_queue_bundle.php, method WebQueueBundle::addDNSCache()
Add an entry to the web_queue_bundles DNS cache
- addDocumentsGetPostingsSliceByIdTestCase
- in file index_shard_test.php, method IndexShardTest::addDocumentsGetPostingsSliceByIdTestCase()
Check if can store documents into an index shard and retrieve them
- addDocumentWords
- in file index_shard.php, method IndexShard::addDocumentWords()
Add a new document to the index shard with the given summary offset.
- addFeedItemIfNew
- in file source_model.php, method SourceModel::addFeedItemIfNew()
Adds words extracted feed data in $item to $feed_shard and adds $item to db if it isn't already there
- addGotRobotTxtFilter
- in file web_queue_bundle.php, method WebQueueBundle::addGotRobotTxtFilter()
Adds the supplied $host to the got_robottxt_filter
- addIndexData
- in file index_archive_bundle.php, method IndexArchiveBundle::addIndexData()
Adds the provided mini inverted index data to the IndexArchiveBundle Expects initGenerationToAdd to be called before, so generation is correct
- addKeywordLink
- in file page_rule_parser.php, method PageRuleParser::addKeywordLink()
Adds a $keywords => $link_text pair to the KEYWORD_LINKS array fro
- addKeywordLinks
- in file search_controller.php, method SearchController::addKeywordLinks()
Function used to add links for keyword searches in keyword_links array of $cache_item to the text of the $web_page we are going to display the cache of as part of a pache page request
- addLinkGetPostingsSliceByIdTestCase
- in file index_shard_test.php, method IndexShardTest::addLinkGetPostingsSliceByIdTestCase()
Check if can store link documents into an index shard and retrieve them
- addLocale
- in file locale_model.php, method LocaleModel::addLocale()
Adds information concerning a new locale to the database
- addMachine
- in file machine_model.php, method MachineModel::addMachine()
Add a rolename to the database using provided string
- addMediaSource
- in file source_model.php, method SourceModel::addMediaSource()
Used to add a new video, rss, or other sources to Yioop
- addMetaWord
- in file page_rule_parser.php, method PageRuleParser::addMetaWord()
Adds a meta word u:$field:$page_data[$field_name] to the array of meta words for this page
- addObjects
- in file web_archive.php, method WebArchive::addObjects()
Adds objects to the WebArchive
- addObjectTestCase
- in file web_archive_test.php, method WebArchiveTest::addObjectTestCase()
Inserts three objects into a web archive. To look up an object in a web
- addPages
- in file index_archive_bundle.php, method IndexArchiveBundle::addPages()
Add the array of $pages to the summaries WebArchiveBundle pages being stored in the partition $generation and the field used to store the resulting offsets given by $offset_field.
- addPages
- in file web_archive_bundle.php, method WebArchiveBundle::addPages()
Add the array of $pages to the WebArchiveBundle pages being stored in the partition according to write partition and the field used to store the resulting offsets given by $offset_field.
- addQueueTestCase
- in file web_queue_bundle_test.php, method WebQueueBundleTest::addQueueTestCase()
Does two adds to the WebQueueBundle of urls and weight. Then checks the contents of the queue to see if as expected. Then does a rebuild on the hash table of the queue and checks that the contents have not changed.
- addRegexDelimiters
- in file utility.php, function addRegexDelimiters()
Adds delimiters to a regex that may or may not have them
- addRobotPaths
- in file web_queue_bundle.php, method WebQueueBundle::addRobotPaths()
Adds all the paths for a host to the Robots Web Archive.
- addRobotPathsCheckRobotOkayTestCase
- in file web_queue_bundle_test.php, method WebQueueBundleTest::addRobotPathsCheckRobotOkayTestCase()
Tests the methods addRobotPaths and checkRobotOkay
- addRole
- in file role_model.php, method RoleModel::addRole()
Add a rolename to the database using provided string
- addScheduleToScheduleDirectory
- in file fetch_controller.php, method FetchController::addScheduleToScheduleDirectory()
Adds a file with contents $data and with name containing $address and $time to a subfolder $day of a folder $dir
- addSearchViewData
- in file search_controller.php, method SearchController::addSearchViewData()
Prepares the array $data so the SearchView can draw search results
- addSeenUrlFilter
- in file web_queue_bundle.php, method WebQueueBundle::addSeenUrlFilter()
Adds the supplied url to the url_exists_filter_bundle
- addShardDictionary
- in file index_dictionary.php, method IndexDictionary::addShardDictionary()
Adds the words in the provided IndexShard to the dictionary.
- addSubsearch
- in file source_model.php, method SourceModel::addSubsearch()
Adds a new subsearch to the list of subsearches. This are displayed at the top od the Yioop search pages.
- addTestCase
- in file trie_test.php, method TrieTest::addTestCase()
Check if we add something into our Trie, add returns the correct
- addToCrawlSites
- in file fetcher.php, method Fetcher::addToCrawlSites()
Used to add a set of links from a web page to the array of sites which need to be crawled.
- addUrlsQueue
- in file web_queue_bundle.php, method WebQueueBundle::addUrlsQueue()
Adds an array of (url, weight) pairs to the WebQueueBundle.
- addUser
- in file user_model.php, method UserModel::addUser()
Add a user with a given username and password to the list of users that can login to the admin panel
- addUserRole
- in file user_model.php, method UserModel::addUserRole()
Adds a role to a given user
- adjustQueueWeight
- in file web_queue_bundle.php, method WebQueueBundle::adjustQueueWeight()
Adjusts the weight of the given url in the priority queue by amount delta
- adjustWeight
- in file priority_queue.php, method PriorityQueue::adjustWeight()
Add $delta to the $ith element in the priority queue and then adjusts the queue to store the heap property
- AdminController
- in file admin_controller.php, class AdminController
Controller used to handle admin functionalities such as modify login and password, CREATE, UPDATE,DELETE operations for users, roles, locale, and crawls
- AdminView
- in file admin_view.php, class AdminView
View responsible for drawing the admin pages of the SeekQuarry search engine site
- advance
- in file intersect_iterator.php, method IntersectIterator::advance()
Forwards the iterator one group of docs
- advance
- in file network_iterator.php, method NetworkIterator::advance()
Forwards the iterator one group of docs
- advance
- in file word_iterator.php, method WordIterator::advance()
Forwards the iterator one group of docs
- advance
- in file negation_iterator.php, method NegationIterator::advance()
Forwards the iterator one group of docs (must be size 1)
- advance
- in file doc_iterator.php, method DocIterator::advance()
Forwards the iterator one group of docs
- advance
- in file union_iterator.php, method UnionIterator::advance()
Forwards the iterator one group of docs
- advance
- in file group_iterator.php, method GroupIterator::advance()
Forwards the iterator one group of docs
- advance
- in file index_bundle_iterator.php, method IndexBundleIterator::advance()
Forwards the iterator one group of docs
- advanceGeneration
- in file doc_iterator.php, method DocIterator::advanceGeneration()
Switches which index shard is being used to return occurrences of the word to the next shard containing the word
- advanceGeneration
- in file word_iterator.php, method WordIterator::advanceGeneration()
Switches which index shard is being used to return occurrences of the word to the next shard containing the word
- advanceSeenDocs
- in file index_bundle_iterator.php, method IndexBundleIterator::advanceSeenDocs()
Updates the seen_docs count during an advance() call
- advanceSeenDocs
- in file doc_iterator.php, method DocIterator::advanceSeenDocs()
Updates the seen_docs count during an advance() call
- advanceSeenDocs
- in file word_iterator.php, method WordIterator::advanceSeenDocs()
Updates the seen_docs count during an advance() call
- AD_HOC_TITLE_LENGTH
- in file config.php, constant AD_HOC_TITLE_LENGTH
Number of total description deemed title
- affectedRows
- in file datasource_manager.php, method DatasourceManager::affectedRows()
Returns the number of rows affected by the last sql statement
- affectedRows
- in file sqlite3_manager.php, method Sqlite3Manager::affectedRows()
- affectedRows
- in file sqlite_manager.php, method SqliteManager::affectedRows()
- affectedRows
- in file mysql_manager.php, method MysqlManager::affectedRows()
- affectedRows
- in file pdo_manager.php, method PdoManager::affectedRows()
- AGENT_LIST
- in file crawl_constants.php, class constant CrawlConstants::AGENT_LIST
- aggregateCrawlList
- in file crawl_model.php, method CrawlModel::aggregateCrawlList()
When @see getCrawlList() is used in a multi-queue_server this method used to integrate the crawl lists received by the different machines
- aggregateScores
- in file group_iterator.php, method GroupIterator::aggregateScores()
For a collection of pages each with the same url, computes the page with the min score, max score, as well as the sum of the score, sum of the ranks, sum of the relevance score, and count. Stores this information in the first element of the array of pages.
- aggregateStalled
- in file crawl_model.php, method CrawlModel::aggregateStalled()
When @see crawlStalled() is used in a multi-queue_server this method used to integrate the stalled information received by the different machines
- aggregateStatuses
- in file crawl_model.php, method CrawlModel::aggregateStatuses()
When @see crawlStatus() is used in a multi-queue_server this method used to integrate the status information received by the different machines
- allowedToCrawlSite
- in file queue_server.php, method QueueServer::allowedToCrawlSite()
Checks if url belongs to a list of sites that are allowed to be crawled and that the file type is crawlable
- allowedToCrawlSiteTestCase
- in file queue_server_test.php, method QueueServerTest::allowedToCrawlSiteTestCase()
allowedToCrawlSite check if a url is matches a list of url
- ALLOWED_SITES
- in file crawl_constants.php, class constant CrawlConstants::ALLOWED_SITES
- ALWAYS_IGNORE
- in file ppt_processor.php, class constant PptProcessor::ALWAYS_IGNORE
- ALWAYS_RETURN_PROBE
- in file hash_table.php, class constant HashTable::ALWAYS_RETURN_PROBE
Flag for hash table lookup methods
- AnalyticsManager
- in file analytics_manager.php, class AnalyticsManager
Used to set and get SQL query and search query timing statistic between models and index_bundle_iterators
- API_ACCESS
- in file config.php, constant API_ACCESS
- appendIndexShard
- in file index_shard.php, method IndexShard::appendIndexShard()
Adds the contents of the supplied $index_shard to the current index shard
- appendIndexShardTestCase
- in file index_shard_test.php, method IndexShardTest::appendIndexShardTestCase()
Check that appending two index shards works correctly
- APP_DIR
- in file config.php, constant APP_DIR
- ArcArchiveBundleIterator
- in file arc_archive_bundle_iterator.php, class ArcArchiveBundleIterator
Used to iterate through the records of a collection of arc files stored in
- ArchiveBundleIterator
- in file archive_bundle_iterator.php, class ArchiveBundleIterator
Abstract class used to model iterating documents indexed in an WebArchiveBundle or set of such bundles.
- ArchiveController
- in file archive_controller.php, class ArchiveController
Fetcher machines also act as archives for complete caches of web pages, this controller is used to handle access to these web page caches
- archiveSchedule
- in file fetch_controller.php, method FetchController::archiveSchedule()
Checks to see whether there are more pages to extract from the current
- archive_base_name
- in file crawl_constants.php, class constant CrawlConstants::archive_base_name
- ARCHIVE_BATCH_SIZE
- in file config.php, constant ARCHIVE_BATCH_SIZE
number of pages to extract from an archive in one go
- ARCHIVE_CRAWL
- in file crawl_constants.php, class constant CrawlConstants::ARCHIVE_CRAWL
- ARCHIVE_LOCK_TIMEOUT
- in file config.php, constant ARCHIVE_LOCK_TIMEOUT
Time in seconds to wait to acquire an exclusive lock before we're no longer allowed to extract the next batch of pages for an archive crawl. This is intended to prevent a fetcher from waiting to acquire the lock, then getting it just before cURL gives up and times out the request.
- ArcTool
- in file arc_tool.php, class ArcTool
Command line program that allows one to examine the content of the WebArchiveBundles and IndexArchiveBundles of Yioop crawls.
- ARC_DATA
- in file crawl_constants.php, class constant CrawlConstants::ARC_DATA
- ARC_DIR
- in file crawl_constants.php, class constant CrawlConstants::ARC_DIR
- ARC_TYPE
- in file crawl_constants.php, class constant CrawlConstants::ARC_TYPE
- assertEqual
- in file unit_test.php, method UnitTest::assertEqual()
Checks that $x and $y are the same, the result of the test is added to $this->test_case_results
- assertFalse
- in file unit_test.php, method UnitTest::assertFalse()
Checks that $x can coerced to false, the result of the test is added to $this->test_case_results
- assertNotEqual
- in file unit_test.php, method UnitTest::assertNotEqual()
Checks that $x and $y are not the same, the result of the test is added to $this->test_case_results
- assertTrue
- in file unit_test.php, method UnitTest::assertTrue()
Checks that $x can coerced to true, the result of the test is added to $this->test_case_results
- AUTH_KEY
- in file config.php, constant AUTH_KEY
- autoIncrement
- in file datasource_manager.php, method DatasourceManager::autoIncrement()
Returns string for given DBMS CREATE TABLE equivalent to auto_increment (at least as far as Yioop requires).
- AVERAGE_DESCRIPTION_LENGTH
- in file crawl_constants.php, class constant CrawlConstants::AVERAGE_DESCRIPTION_LENGTH
- AVERAGE_TITLE_LENGTH
- in file crawl_constants.php, class constant CrawlConstants::AVERAGE_TITLE_LENGTH
- AVERAGE_TOTAL_LINK_TEXT_LENGTH
- in file crawl_constants.php, class constant CrawlConstants::AVERAGE_TOTAL_LINK_TEXT_LENGTH
- arc_tool.php
- procedural page arc_tool.php
- admin_controller.php
- procedural page admin_controller.php
- archive_controller.php
- procedural page archive_controller.php
- analytics_manager.php
- procedural page analytics_manager.php
- archive_bundle_iterator.php
- procedural page archive_bundle_iterator.php
- arc_archive_bundle_iterator.php
- procedural page arc_archive_bundle_iterator.php
- activity_model.php
- procedural page activity_model.php
- admin_view.php
- procedural page admin_view.php
- activity_element.php
- procedural page activity_element.php
top
b
- $base_query
- in file network_iterator.php, variable NetworkIterator::$base_query
Part of query without limit and num to be processed by all queue_server machines
- $bits
- in file bzip2_block_iterator.php, variable BZip2BlockIterator::$bits
Stores the left over bits of a bz2 block
- $block
- in file bzip2_block_iterator.php, variable BZip2BlockIterator::$block
Used to build and store a bz2 block from the file stream
- $blocks
- in file index_shard.php, variable IndexShard::$blocks
An cached array of disk blocks for an index shard that has not been completely loaded into memory.
- $blocks
- in file index_dictionary.php, variable IndexDictionary::$blocks
An cached array of disk blocks for an index dictionary that has not been completely loaded into memory.
- $buffer
- in file bzip2_block_iterator.php, variable BZip2BlockIterator::$buffer
Since block sizes are not constant used to store sufficiently many
- $buffer
- in file text_archive_bundle_iterator.php, variable TextArchiveBundleIterator::$buffer
Used to buffer data from the currently opened file
- $buffer
- in file tokenizer.php, variable EnStemmer::$buffer
storage used in computing the stem
- $buffer
- in file tokenizer.php, variable ItStemmer::$buffer
Storage used in computing the stem
- $buffer_block_num
- in file text_archive_bundle_iterator.php, variable TextArchiveBundleIterator::$buffer_block_num
Which block of self::BUFFER_SIZE from the current archive file is stored in the file $this->buffer_filename
- $buffer_fh
- in file text_archive_bundle_iterator.php, variable TextArchiveBundleIterator::$buffer_fh
If gzip is being used a buffer file is also employed to try to reduce the number of calls to gzseek. $buffer_fh is a filehandle for the buffer file
- $buffer_filename
- in file text_archive_bundle_iterator.php, variable TextArchiveBundleIterator::$buffer_filename
Name of a buffer file to be used to reduce gzseek calls in the case where gzip compression is being used
- badFormatMessageAndExit
- in file arc_tool.php, method ArcTool::badFormatMessageAndExit()
Outputs the "hey, this isn't a known bundle message" and then exit()'s.
- banner
- in file configure_tool.php, method ConfigureTool::banner()
Prints the banner used by this configuration tool
- base64Hash
- in file utility.php, function base64Hash()
Converts a crawl hash number to something closer to base64 coded but so doesn't get confused in urls or DBs
- baseLink
- in file search_controller.php, method SearchController::baseLink()
Used to create the base link for links to be displayed on caches of web pages this link points to yioop because links on cache pages are to other cache pages
- BASE_DIR
- in file arc_tool.php, constant BASE_DIR
Calculate base directory of script @ignore
- beginMatch
- in file phrase_model.php, method PhraseModel::beginMatch()
Matches terms (non white-char strings) in the language $lang_tag in $phrase that begin with $start_with and don't contain $not_contain, replaces $start_with with $new_prefix and adds $suffix to the end
- BLANK
- in file index_shard.php, class constant IndexShard::BLANK
Represents an empty prefix item
- BLOCK_ENDMARK
- in file bzip2_block_iterator.php, class constant BZip2BlockIterator::BLOCK_ENDMARK
String at the end of each bz2 block
- BLOCK_HEADER
- in file bzip2_block_iterator.php, class constant BZip2BlockIterator::BLOCK_HEADER
String at the start of each bz2 block
- BLOCK_LEADER_RE
- in file bzip2_block_iterator.php, class constant BZip2BlockIterator::BLOCK_LEADER_RE
Blocks are NOT byte-aligned, so the block header (and endmark) may show up shifted right by 0-8 bits in various places throughout the file. This regular expression matches any of the possible shifts for both the block header and the block endmark.
- BLOCK_SIZE
- in file odp_rdf_bundle_iterator.php, class constant OdpRdfArchiveBundleIterator::BLOCK_SIZE
How many bytes to read into buffer from gzip stream in one go
- BLOCK_SIZE
- in file nword_grams.php, class constant NWordGrams::BLOCK_SIZE
- BLOCK_SIZE
- in file bmp_processor.php, class constant BmpProcessor::BLOCK_SIZE
Size in bytes of one block to read in of BMP
- BLOCK_SIZE
- in file bzip2_block_iterator.php, class constant BZip2BlockIterator::BLOCK_SIZE
How many bytes to read into buffer from bz2 stream in one go
- blog.php
- procedural page blog.php
- BloomFilterBundle
- in file bloom_filter_bundle.php, class BloomFilterBundle
A BloomFilterBundle is a directory of BloomFilterFile.
- BloomFilterFile
- in file bloom_filter_file.php, class BloomFilterFile
Code used to manage a bloom filter in-memory and in file.
- BloomFilterFileTest
- in file bloom_filter_file_test.php, class BloomFilterFileTest
Used to test that the BloomFilterFile class provides the basic functionality of a persistent set. I.e., we can insert things into it, and we can do membership testing
- BmpProcessor
- in file bmp_processor.php, class BmpProcessor
Used to create crawl summary information for BMP and ICO files
- BMP_HEADER_LEN
- in file bmp_processor.php, class constant BmpProcessor::BMP_HEADER_LEN
Size in bytes of BMP header
- BMP_ID
- in file bmp_processor.php, class constant BmpProcessor::BMP_ID
Size in bytes of BMP identifier and size info
- boldKeywords
- in file model.php, method Model::boldKeywords()
Given a string, wraps in bold html tags a set of key words it contains.
- BOOST
- in file crawl_constants.php, class constant CrawlConstants::BOOST
- bot.php
- procedural page bot.php
- BOTH
- in file crawl_constants.php, class constant CrawlConstants::BOTH
Used to say what kind of queue_server this is
- BREADTH_FIRST
- in file crawl_constants.php, class constant CrawlConstants::BREADTH_FIRST
- BUFFER_SIZE
- in file text_archive_bundle_iterator.php, class constant TextArchiveBundleIterator::BUFFER_SIZE
How many bytes at a time should be read from the current archive
- buildMiniInvertedIndex
- in file fetcher.php, method Fetcher::buildMiniInvertedIndex()
Builds an inverted index shard (word --> {docs it appears in}) for the current batch of SEEN_URLS_BEFORE_UPDATE_SCHEDULER many pages.
- BZip2BlockIterator
- in file bzip2_block_iterator.php, class BZip2BlockIterator
This class is used to allow one to iterate through a Bzip2 file.
- bloom_filter_bundle.php
- procedural page bloom_filter_bundle.php
- bloom_filter_file.php
- procedural page bloom_filter_file.php
- bzip2_block_iterator.php
- procedural page bzip2_block_iterator.php
- bmp_processor.php
- procedural page bmp_processor.php
- bloom_filter_file_test.php
- procedural page bloom_filter_file_test.php
top
c
- $cache_pages
- in file queue_server.php, variable QueueServer::$cache_pages
- $cache_pages
- in file fetcher.php, variable Fetcher::$cache_pages
Whether to cache pages or just the summaries
- $children
- in file epub_processor.php, variable EpubProcessor::$children
The child tag element of a tag element.
- $column_separator
- in file database_bundle_iterator.php, variable DatabaseBundleIterator::$column_separator
DB Records are imported as a text string where column_separator
- $compressor
- in file web_archive_bundle.php, variable WebArchiveBundle::$compressor
How Compressor object used to compress/uncompress data stored in
- $compressor
- in file web_archive.php, variable WebArchive::$compressor
Filter object used to compress/uncompress objects stored in archive
- $configure
- in file locale_model.php, variable LocaleModel::$configure
Used to store ini file data of the current locale
- $content
- in file epub_processor.php, variable EpubProcessor::$content
The content of the tag element or attribute, used to extract the fields like title, creator, language of the document
- $count
- in file hash_table.php, variable HashTable::$count
Number of items currently in the hash table
- $count
- in file web_archive.php, variable WebArchive::$count
number of item in archive
- $count
- in file priority_queue.php, variable PriorityQueue::$count
Number of items that are currently stored in the queue
- $count
- in file web_archive_bundle.php, variable WebArchiveBundle::$count
Total number of page objects stored by this WebArchiveBundle
- $count
- in file web_archive_bundle_iterator.php, variable WebArchiveBundleIterator::$count
Number of documents in the web archive bundle being iterated over
- $count_block
- in file index_bundle_iterator.php, variable IndexBundleIterator::$count_block
The number of documents in the current block
- $count_block
- in file group_iterator.php, variable GroupIterator::$count_block
The number of documents in the current block after filtering
- $count_block_unfiltered
- in file group_iterator.php, variable GroupIterator::$count_block_unfiltered
The number of documents in the current block before filtering
- $count_block_unfiltered
- in file union_iterator.php, variable UnionIterator::$count_block_unfiltered
The number of documents in the current block before filtering
- $crawl_delay_filter
- in file web_queue_bundle.php, variable WebQueueBundle::$crawl_delay_filter
BloomFilter used to keep track of crawl delay in seconds for a given
- $crawl_index
- in file fetcher.php, variable Fetcher::$crawl_index
If the crawl_type is self::ARCHIVE_CRAWL, then crawl_index is the
- $crawl_index
- in file queue_server.php, variable QueueServer::$crawl_index
If the crawl_type is self::ARCHIVE_CRAWL, then crawl_index is the
- $crawl_order
- in file fetcher.php, variable Fetcher::$crawl_order
Stores the name of the ordering used to crawl pages. This is used in a switch/case when computing weights of urls to be crawled before sending these new urls back to a queue_server.
- $crawl_order
- in file queue_server.php, variable QueueServer::$crawl_order
Constant saying the method used to order the priority queue for the crawl
- $crawl_time
- in file fetcher.php, variable Fetcher::$crawl_time
Timestamp of the current crawl
- $crawl_time
- in file queue_server.php, variable QueueServer::$crawl_time
The timestamp of the current active crawl
- $crawl_type
- in file fetcher.php, variable Fetcher::$crawl_type
Indicates the kind of crawl being performed: self::WEB_CRAWL indicates
- $crawl_type
- in file queue_server.php, variable QueueServer::$crawl_type
Indicates the kind of crawl being performed: self::WEB_CRAWL indicates
- $cron_file
- in file cron_model.php, variable CronModel::$cron_file
File name used to store the cron table associative array
- $cron_table
- in file cron_model.php, variable CronModel::$cron_table
An associative array of key_name => timestamps use to indicate
- $current_block_fresh
- in file index_bundle_iterator.php, variable IndexBundleIterator::$current_block_fresh
Says whether the value in $this->count_block is up to date
- $current_block_hashes
- in file group_iterator.php, variable GroupIterator::$current_block_hashes
hashes of document web pages seen in results returned from the
- $current_filter
- in file bloom_filter_bundle.php, variable BloomFilterBundle::$current_filter
Reference to the filter which will be used to store new data
- $current_filter_count
- in file bloom_filter_bundle.php, variable BloomFilterBundle::$current_filter_count
The number of items which have been stored in the current filter
- $current_generation
- in file doc_iterator.php, variable DocIterator::$current_generation
Numeric number of current shard
- $current_generation
- in file word_iterator.php, variable WordIterator::$current_generation
Numeric number of current shard
- $current_machine
- in file group_iterator.php, variable GroupIterator::$current_machine
Id of queue_server this group_iterator lives on
- $current_machine
- in file parallel_model.php, variable ParallelModel::$current_machine
If known the id of the queue_server this belongs to
- $current_offset
- in file word_iterator.php, variable WordIterator::$current_offset
The current byte offset in the IndexShard
- $current_offset
- in file text_archive_bundle_iterator.php, variable TextArchiveBundleIterator::$current_offset
current byte offset into the current arc file
- $current_offset
- in file doc_iterator.php, variable DocIterator::$current_offset
The current byte offset in the IndexShard
- $current_page_num
- in file text_archive_bundle_iterator.php, variable TextArchiveBundleIterator::$current_page_num
current number of pages into the current arc file
- $current_partition_num
- in file text_archive_bundle_iterator.php, variable TextArchiveBundleIterator::$current_partition_num
Counting in glob order for this arc archive bundle directory, the current active file number of the arc file being process.
- $current_partition_num
- in file web_archive_bundle_iterator.php, variable WebArchiveBundleIterator::$current_partition_num
Index of web archive in the web archive bundle that the iterator is
- $current_server
- in file fetcher.php, variable Fetcher::$current_server
Index into $queue_servers of the server get schedule from (or last one
- $current_shard
- in file index_archive_bundle.php, variable IndexArchiveBundle::$current_shard
Index Shard for current generation inverted word index
- code_tool.php
- procedural page code_tool.php
- cache
- in file archive_controller.php, method ArchiveController::cache()
Retrieves the requested page from the WebArchiveBundle and echo it page,
- cacheRequest
- in file search_controller.php, method SearchController::cacheRequest()
Part of Yioop! Search API. Performs a related to a given url search query and returns associative array of query results
- cacheRequestAndOutput
- in file search_controller.php, method SearchController::cacheRequestAndOutput()
Used to get and render a cached web page
- CACHE_LINK
- in file config.php, constant CACHE_LINK
- CACHE_PAGES
- in file crawl_constants.php, class constant CrawlConstants::CACHE_PAGES
- CACHE_PAGE_PARTITION
- in file crawl_constants.php, class constant CrawlConstants::CACHE_PAGE_PARTITION
- CACHE_ROBOT_TXT_TIME
- in file config.php, constant CACHE_ROBOT_TXT_TIME
how long in seconds to keep a cache of a robot.txt
- calculateControlWords
- in file search_controller.php, method SearchController::calculateControlWords()
Extracts from the query string any control words: mix:, m:, raw:, no: and returns an array consisting of the query with these words removed, and then variables for their values.
- calculateLang
- in file text_processor.php, method TextProcessor::calculateLang()
Tries to determine the language of the document by looking at the
- calculateLinkMetas
- in file phrase_parser.php, method PhraseParser::calculateLinkMetas()
Used to compute all the meta ids for a given link with $url and $link_text that was on a site with $site_url.
- calculateMetas
- in file phrase_parser.php, method PhraseParser::calculateMetas()
Calculates the meta words to be associated with a given downloaded document. These words will be associated with the document in the index for (server:apache) even if the document itself did not contain them.
- calculateMetas
- in file source_model.php, method SourceModel::calculateMetas()
Used to calculate the meta words for RSS feed items
- calculatePartition
- in file utility.php, function calculatePartition()
Used by a controller to say which queue_server should receive
- calculateScheduleMetaInfo
- in file queue_server.php, method QueueServer::calculateScheduleMetaInfo()
Used to create an encode a string representing with meta info for a fetcher schedule.
- canonicalizeLinks
- in file search_controller.php, method SearchController::canonicalizeLinks()
Make relative links canonical with respect to provided $url for links appear within the Dom node.
- canonicalizePunctuatedTerms
- in file phrase_parser.php, method PhraseParser::canonicalizePunctuatedTerms()
This functions tries to convert acronyms, e-mail, urls, etc into a format that does not involved punctuation that will be stripped as we extract phrases.
- canonicalLink
- in file url_parser.php, method UrlParser::canonicalLink()
Given a $link that was obtained from a website $site, returns a complete URL for that link.
- canonicalLinkTestCase
- in file url_parser_test.php, method UrlParserTest::canonicalLinkTestCase()
Check if can go from a relative link, base link to a complete link
- case_name
- in file unit_test.php, class constant UnitTest::case_name
The suffix that all TestCase methods need to have to be called by run()
- changeCopyrightFile
- in file code_tool.php, function changeCopyrightFile()
Callback function applied to each file in the directory being traversed
- changeDocumentOffsets
- in file index_shard.php, method IndexShard::changeDocumentOffsets()
Changes the summary offsets associated with a set of doc_ids to new
- changeDocumentOffsetTestCase
- in file index_shard_test.php, method IndexShardTest::changeDocumentOffsetTestCase()
Check that changing document offsets works
- changeInMicrotime
- in file utility.php, function changeInMicrotime()
Measures the change in time in seconds between two timestamps to microsecond precision
- changePassword
- in file signin_model.php, method SigninModel::changePassword()
Changes the password of a given user
- charCopy
- in file utility.php, function charCopy()
Copies from $source string beginning at position $start, $length many bytes to destination string
- checkAllowedController
- in file index.php, function checkAllowedController()
Verifies that the supplied controller string is a controller for the SeekQuarry app
- checkAllZeros
- in file doc_processor.php, method DocProcessor::checkAllZeros()
Scans document starting at given position and looking forward eight character to see if these are all \0 or not.
- checkArchiveScheduler
- in file fetcher.php, method Fetcher::checkArchiveScheduler()
During an archive crawl this method is used to get from the name server a collection of pages to process. The fetcher will later process these and send summaries to various queue_servers.
- checkCrawlTime
- in file fetcher.php, method Fetcher::checkCrawlTime()
Makes a request of the name server machine to get the timestamp of the currently running crawl to see if it changed
- checkCSRFToken
- in file controller.php, method Controller::checkCSRFToken()
Checks if the form CSRF (cross-site request forgery preventing) token matches the given user and has not expired (1 hour till expires)
- checkDescriptionTestCase
- in file pptx_processor_test.php, method PptxProcessorTest::checkDescriptionTestCase()
Checks if description is not null
- checkEof
- in file text_archive_bundle_iterator.php, method TextArchiveBundleIterator::checkEof()
Checks if this object's archive's current partition is at an end of file
- checkFileHandle
- in file text_archive_bundle_iterator.php, method TextArchiveBundleIterator::checkFileHandle()
Checks if have a valid handle to object's archive's current partition
- checkForSuffix
- in file tokenizer.php, method ItStemmer::checkForSuffix()
Checks if a string is a suffix for another string
- checkLangTestCase
- in file pptx_processor_test.php, method PptxProcessorTest::checkLangTestCase()
Checks Language of pptx is correct or not
- checkLinksTestCase
- in file pptx_processor_test.php, method PptxProcessorTest::checkLinksTestCase()
Checks the links are correct or not
- checkPageForText
- in file doc_processor.php, method DocProcessor::checkPageForText()
Scans document starting at given position and looking forward eight character to see if these are ASCII printable or not.
- checkQuote
- in file intersect_iterator.php, method IntersectIterator::checkQuote()
Auxiliary function for @see checkQuotes used to check if quoted terms in search query appear exactly in the position lists of the current document
- checkQuotes
- in file intersect_iterator.php, method IntersectIterator::checkQuotes()
Used to check if quoted terms in search query appear exactly in the position lists of the current document
- checkRecursiveUrl
- in file url_parser.php, method UrlParser::checkRecursiveUrl()
Checks if a url has a repeated set of subdirectories, and if the number of repeats occurs more than some threshold number of times
- checkRequest
- in file controller.php, method Controller::checkRequest()
Checks the request if a request is for a valid activity and if it uses the correct authorization key
- checkRobotOkay
- in file web_queue_bundle.php, method WebQueueBundle::checkRobotOkay()
Checks if the given $url is allowed to be crawled based on stored robots.txt info.
- checkSave
- in file persistent_structure.php, method PersistentStructure::checkSave()
Add one to the unsaved_operations count. If this goes above the
- checkScheduler
- in file fetcher.php, method Fetcher::checkScheduler()
Get status, current crawl, crawl order, and new site information from the queue_server.
- checkScheduler
- in file mirror.php, method Mirror::checkScheduler()
Gets status and, if done processing all other mirroring activities, gets a new list of files that have changed since the last synchronization from the web app of the machine we are mirroring with.
- checkSignin
- in file admin_controller.php, method AdminController::checkSignin()
Checks whether the user name and password sent presumably by the signin form match a user in the database
- checksum
- in file file_cache.php, method FileCache::checksum()
Makes a 0 - self::NUMBER_OF_BINS value out of the provided key
- checkTitleTestCase
- in file pptx_processor_test.php, method PptxProcessorTest::checkTitleTestCase()
Checks title of the pptx is correct or not
- checkUpdateCrawlParameters
- in file queue_server.php, method QueueServer::checkUpdateCrawlParameters()
Checks to see if the parameters by which the active crawl are being
- checkValidSignin
- in file signin_model.php, method SigninModel::checkValidSignin()
Checks that a username password pair is valid
- citeCallback
- in file mediawiki_bundle_iterator.php, function citeCallback()
Used to convert {{cite }} to a numbered link to a citation
- clean
- in file code_tool.php, function clean()
- clean
- in file controller.php, method Controller::clean()
Used to clean strings that might be tainted as originate from the user
- cleanLinesFile
- in file code_tool.php, function cleanLinesFile()
Callback function applied to each file in the directory being traversed by @see clean().
- cleanRedundantLinks
- in file url_parser.php, method UrlParser::cleanRedundantLinks()
Used to delete links from array of links $links based on whether they are the same as the site they came from (or otherwise judged irrelevant)
- cleanTextBlock
- in file doc_processor.php, method DocProcessor::cleanTextBlock()
Scans document starting at given position forward eight character returning those characters which are ASCII printable
- clearQuerySavePoint
- in file parallel_model.php, method ParallelModel::clearQuerySavePoint()
A save point is used to store to disk a sequence generation-doc-offset pairs of a particular mix query when doing an archive crawl of a crawl mix. This is used so that the mix can remember where it was the next time it is invoked by the web app on the machine in question.
- clearQuerySavepoint
- in file search_controller.php, method SearchController::clearQuerySavepoint()
Query timestamps can be used to save an iteration position in a a set of query results. This method allows one to delete the supplied save point.
- clearQuerySavePoint
- in file crawl_controller.php, method CrawlController::clearQuerySavePoint()
A save point is used to store to disk a sequence generation-doc-offset pairs of a particular mix query when doing an archive crawl of a crawl mix. This is used so that the mix can remember where it was the next time it is invoked by the web app on the machine in question.
- close
- in file web_archive.php, method WebArchive::close()
Closes a file handle (which should be of a web archive)
- close
- in file bzip2_block_iterator.php, method BZip2BlockIterator::close()
Used to close the file associated with this iterator
- closeDanglingTags
- in file text_processor.php, method TextProcessor::closeDanglingTags()
If an end of file is reached before closed tags are seen, this methods closes these tags in the correct order.
- closeUrlArchive
- in file web_queue_bundle.php, method WebQueueBundle::closeUrlArchive()
Closes a file handle to the url WebArchive
- Cluster
- in file recipe_plugin.php, class Cluster
heap to maintain the MST
- CLUSTER_RATIO
- in file recipe_plugin.php, constant CLUSTER_RATIO
Ratio of clusters/total number of recipes seen
- combinedCrawlInfo
- in file crawl_controller.php, method CrawlController::combinedCrawlInfo()
Handles a request for the combined crawl list, stalled, and status
- combinedCrawlInfo
- in file crawl_model.php, method CrawlModel::combinedCrawlInfo()
This method is used to reduce the number of network requests
- compare
- in file recipe_plugin.php, method TreeCluster::compare()
- compare
- in file recipe_plugin.php, method Cluster::compare()
- compare
- in file priority_queue.php, method PriorityQueue::compare()
Computes the difference of the two values $value1 and $value2
- completeFillTestCase
- in file hash_table_test.php, method HashTableTest::completeFillTestCase()
Completety fill table. Next insert should fail. Then delete all the
- compress
- in file non_compressor.php, method NonCompressor::compress()
Applies the Compressor compress filter to a string before it is inserted into a WebArchive. In this case, the filter does nothing.
- compress
- in file compressor.php, method Compressor::compress()
Applies the Compressor compress filter to a string before it is inserted into a WebArchive.
- compress
- in file gzip_compressor.php, method GzipCompressor::compress()
Applies the Compressor compress filter to a string before it is inserted into a WebArchive. In this case, applying the filter means gzipping.
- compressAndUnsetSeenUrls
- in file fetcher.php, method Fetcher::compressAndUnsetSeenUrls()
Computes a string of compressed urls fromthe seen urls and extracted links destined for the current queue server. Then unsets these values from $this->found_sites
- compressedIntLen
- in file non_compressor.php, method NonCompressor::compressedIntLen()
Computes the length of an int when packed using the underlying compression algorithm as a fixed length string. The pack function stores ints as 4 byte strings
- compressedIntLen
- in file compressor.php, method Compressor::compressedIntLen()
Computes the length of an int when packed using the underlying
- compressedIntLen
- in file gzip_compressor.php, method GzipCompressor::compressedIntLen()
Computes the length of an int when packed using the underlying
- compressInt
- in file gzip_compressor.php, method GzipCompressor::compressInt()
Used to compress an int as a fixed length string in the format of
- compressInt
- in file compressor.php, method Compressor::compressInt()
Used to compress an int as a fixed length string in the format of the compression algorithm underlying the compressor.
- compressInt
- in file non_compressor.php, method NonCompressor::compressInt()
Used to compress an int as a fixed length string in the format of the compression algorithm underlying the compressor. Since this compressor doesn't compress we just use pack
- Compressor
- in file compressor.php, class Compressor
A Compressor is used to apply a filter to objects before they are stored into a WebArchive. The filter is assumed to be invertible, and the typical intention is the filter carries out some kind of string compression.
- computeOutPages
- in file group_iterator.php, method GroupIterator::computeOutPages()
For a collection of grouped pages generates a grouped summary for each group and returns an array of out pages consisting of single summarized documents for each group. These single summarized documents have aggregated scores.
- computePageHash
- in file fetch_url.php, method FetchUrl::computePageHash()
Computes a hash of a string containing page data for use in deduplication of pages with similar content
- computeProximity
- in file index_shard.php, method IndexShard::computeProximity()
Returns a proximity score for a single term based on its location in doc.
- computeProximity
- in file intersect_iterator.php, method IntersectIterator::computeProximity()
Given the position_lists of a collection of terms computes a score for how close those words were in the given document
- computeRelevance
- in file word_iterator.php, method WordIterator::computeRelevance()
Computes a relevancy score for a posting offset with respect to this
- computeRelevance
- in file negation_iterator.php, method NegationIterator::computeRelevance()
Computes a relevancy score for a posting offset with respect to this
- computeRelevance
- in file union_iterator.php, method UnionIterator::computeRelevance()
Computes a relevancy score for a posting offset with respect to this
- computeRelevance
- in file network_iterator.php, method NetworkIterator::computeRelevance()
Computes a relevancy score for a posting offset with respect to this
- computeRelevance
- in file group_iterator.php, method GroupIterator::computeRelevance()
Computes a relevancy score for a posting offset with respect to this
- computeRelevance
- in file index_bundle_iterator.php, method IndexBundleIterator::computeRelevance()
Computes a relevancy score for a posting offset with respect to this
- computeRelevance
- in file doc_iterator.php, method DocIterator::computeRelevance()
Computes a relevancy score for a posting offset with respect to this iterator and generation. This method is required from the base class, but as a doc iterator doesn't use a posting list, we just resturn 1.0 for this iiterator.
- computeRelevance
- in file intersect_iterator.php, method IntersectIterator::computeRelevance()
Computes a relevancy score for a posting offset with respect to this
- computeSafeSearchScore
- in file phrase_parser.php, method PhraseParser::computeSafeSearchScore()
Scores documents according to the lack or nonlack of sexually explicit terms. Tries to work for several languages.
- computeSafeSearchScoreTestCase
- in file phrase_parser_test.php, method PhraseParserTest::computeSafeSearchScoreTestCase()
Checks whether the same search threshold can classify porn from
- computeStatistics
- in file statistics_controller.php, method StatisticsController::computeStatistics()
Runs the queries necessary to determine httpd code distribution,
- computeTopicLinks
- in file odp_rdf_bundle_iterator.php, method OdpRdfArchiveBundleIterator::computeTopicLinks()
Computes links for prefix topics of an ODP topic path
- config.php
- procedural page config.php
- configure_tool.php
- procedural page configure_tool.php
- createdb.php
- procedural page createdb.php
- configure
- in file admin_controller.php, method AdminController::configure()
Responsible for handling admin request related to the configure activity
- ConfigureElement
- in file configure_element.php, class ConfigureElement
Element responsible for drawing the screen used to set up the search engine
- configureMenu
- in file configure_tool.php, method ConfigureTool::configureMenu()
This is used to draw the main configuration menu and ask for a
- configureRequest
- in file admin_controller.php, method AdminController::configureRequest()
If there is no profile/work directory set up then this method get called to by pass any login and go to the configure screen.
- ConfigureTool
- in file configure_tool.php, class ConfigureTool
Provides a command-line interface way to configure a Yioop Instance.
- confirmChange
- in file configure_tool.php, method ConfigureTool::confirmChange()
Used to select to confirm, cancel, or re-enter the last profile change
- connect
- in file sqlite_manager.php, method SqliteManager::connect()
For an Sqlite database no connection needs to be made so this
- connect
- in file datasource_manager.php, method DatasourceManager::connect()
Connects to a DBMS using data provided or from config.php
- connect
- in file sqlite3_manager.php, method Sqlite3Manager::connect()
For an Sqlite3 database no connection needs to be made so this
- connect
- in file pdo_manager.php, method PdoManager::connect()
- connect
- in file mysql_manager.php, method MysqlManager::connect()
- constructHashTable
- in file web_queue_bundle.php, method WebQueueBundle::constructHashTable()
Mainly, a Factory style wrapper around the HashTable's constructor.
- constructMST
- in file recipe_plugin.php, method Tree::constructMST()
constructs the adjacency matrix for the MST.
- construct_tree
- in file recipe_plugin.php, function construct_tree()
creates tree from the input and apply Kruskal's algorithm to find MST.
- contains
- in file bloom_filter_file.php, method BloomFilterFile::contains()
Checks if the BloomFilter contains the provided $value
- containsGotRobotTxt
- in file web_queue_bundle.php, method WebQueueBundle::containsGotRobotTxt()
Checks if we have a fresh copy of robots.txt info for $host
- containsUrlQueue
- in file web_queue_bundle.php, method WebQueueBundle::containsUrlQueue()
Check is the url queue already contains the given url
- CONTINUE_STATE
- in file crawl_constants.php, class constant CrawlConstants::CONTINUE_STATE
- Controller
- in file controller.php, class Controller
Base controller class for all controllers on the SeekQuarry site.
- controller.php
- procedural page controller.php
- crawl_controller.php
- procedural page crawl_controller.php
- convertArrayLines
- in file admin_controller.php, method AdminController::convertArrayLines()
Converts an array of lines of strings into a single string with proper newlines, each line having been trimmed and potentially cleaned
- convertPixels
- in file utility.php, function convertPixels()
Converts a CSS unit string into its equivalent in pixels. This is used by @see SvgProcessor.
- convertStringCleanArray
- in file admin_controller.php, method AdminController::convertStringCleanArray()
Cleans a string consisting of lines, typically of urls into an array of clean lines. This is used in handling data from the crawl options text areas.
- copyNextSyncFile
- in file mirror.php, method Mirror::copyNextSyncFile()
Downloads the next file from the schedule of files to download received from the web app.
- copyProfileFields
- in file configure_tool.php, method ConfigureTool::copyProfileFields()
Used to copy the contents of $data which are profile fields to a new array.
- copyRecursive
- in file datasource_manager.php, method DatasourceManager::copyRecursive()
Recursively copies a source directory to a destination directory
- copyright
- in file code_tool.php, function copyright()
Updates the copyright info (assuming in Yioop docs format) on files in supplied sub-folder/file. That is, it changes strings matching /2009 - \d\d\d\d/ to 2009 - current_year in those files/file.
- copySiteFields
- in file fetcher.php, method Fetcher::copySiteFields()
Copies fields from the array of site data to the $i indexed element of the $summarized_site_pages and $stored_site_pages array
- copyTable
- in file profile_model.php, method ProfileModel::copyTable()
Copies the contents of table in the first database into the same named table in a second database. It assumes the table exists in both databases
- countCompanyLevelDomainsInCommon
- in file fetcher.php, method Fetcher::countCompanyLevelDomainsInCommon()
Returns the number of links in the array $links which
- countQuery
- in file statistics_controller.php, method StatisticsController::countQuery()
Performs the provided $query of a web crawl (potentially distributed across queue servers). Returns the count of the number of results that would be returned by that query.
- countWordKeys
- in file index_archive_bundle.php, method IndexArchiveBundle::countWordKeys()
Computes the number of occurrences of each of the supplied list of word_keys
- countWords
- in file crawl_model.php, method CrawlModel::countWords()
Computes for each word in an array of words a count of the total number of times it occurs in this crawl model's default index.
- countWords
- in file crawl_controller.php, method CrawlController::countWords()
Receives a request to get counts of the number of occurrences of an
- CrawlConstants
- in file crawl_constants.php, class CrawlConstants
Shared constants and enums used by components that are involved in the crawling process
- CrawlController
- in file crawl_controller.php, class CrawlController
Controller used to manage networked installations of Yioop where
- crawlCrypt
- in file utility.php, function crawlCrypt()
The search engine project's variation on the Unix crypt function using the crawlHash function instead of DES
- CrawlDaemon
- in file crawl_daemon.php, class CrawlDaemon
Used to run scripts as a daemon on *nix systems To use CrawlDaemon need to declare ticks first in a scope that won't go away after CrawlDaemon:init is called
- crawlHash
- in file utility.php, function crawlHash()
Computes an 8 byte hash of a string for use in storing documents.
- crawlItemSummary
- in file search_controller.php, method SearchController::crawlItemSummary()
Generates a string representation of a crawl item suitable for for output in a cache page
- crawlLog
- in file utility.php, function crawlLog()
Logs a message to a logfile or the screen
- CrawlModel
- in file crawl_model.php, class CrawlModel
This is class is used to handle db results for a given phrase search
- CrawloptionsElement
- in file crawloptions_element.php, class CrawloptionsElement
Element responsible for displaying options about how a crawl will be performed. For instance, what are the seed sites for the crawl, what sites are allowed to be crawl what sites must not be crawled, etc.
- crawlStalled
- in file crawl_controller.php, method CrawlController::crawlStalled()
Handles a request for whether or not the crawl is stalled on the
- crawlStalled
- in file crawl_model.php, method CrawlModel::crawlStalled()
Determines if the length of time since any of the fetchers has spoken with any of the queue_servers has exceeded CRAWL_TIME_OUT. If so, typically the caller of this method would do something such as officially stop the crawl.
- crawlStatus
- in file admin_controller.php, method AdminController::crawlStatus()
Used to handle crawlStatus REST activities requesting the status of the current web crawl
- crawlStatus
- in file crawl_model.php, method CrawlModel::crawlStatus()
Returns data about current crawl such as DESCRIPTION, TIMESTAMP, peak memory of various processes, most recent fetcher, most recent urls, urls seen, urls visited, etc.
- crawlStatus
- in file crawl_controller.php, method CrawlController::crawlStatus()
Handles a request for the crawl status (memory use, recent fetchers
- CrawlstatusView
- in file crawlstatus_view.php, class CrawlstatusView
This view is used to display information about crawls that have been made by this seek_quarry instance
- crawlTime
- in file fetch_controller.php, method FetchController::crawlTime()
Checks for the crawl time according either to crawl_status.txt or to network_status.txt, and presents it to the requesting fetcher, along with a list of available queue servers.
- CRAWL_DELAY
- in file crawl_constants.php, class constant CrawlConstants::CRAWL_DELAY
- CRAWL_DIR
- in file config.php, constant CRAWL_DIR
- CRAWL_INDEX
- in file crawl_constants.php, class constant CrawlConstants::CRAWL_INDEX
- CRAWL_ORDER
- in file crawl_constants.php, class constant CrawlConstants::CRAWL_ORDER
- CRAWL_TIME
- in file crawl_constants.php, class constant CrawlConstants::CRAWL_TIME
- CRAWL_TIME_OUT
- in file config.php, constant CRAWL_TIME_OUT
Number of seconds of no fetcher contact before crawl is deemed dead
- CRAWL_TYPE
- in file crawl_constants.php, class constant CrawlConstants::CRAWL_TYPE
- createDatabaseTables
- in file profile_model.php, method ProfileModel::createDatabaseTables()
On a blank database this method create all the tables necessary for Yioop
- createDomBoxNode
- in file search_controller.php, method SearchController::createDomBoxNode()
Creates a bordered tag (usually div) in which to put meta content on a page when it is displayed
- createHistoryDataStructure
- in file search_controller.php, method SearchController::createHistoryDataStructure()
Creates a data structure for storing years, months and associated
- createIfNecessaryDirectory
- in file profile_model.php, method ProfileModel::createIfNecessaryDirectory()
Creates a directory and sets it to world permission if it doesn't aleady exist
- createLinkDivs
- in file search_controller.php, method SearchController::createLinkDivs()
Create divs for links based on all (year, month) combinations
- createSummaryAndToggleNodes
- in file search_controller.php, method SearchController::createSummaryAndToggleNodes()
Creates the toggle link and hidden div for extracted header and summary element on cache pages
- createThumb
- in file svg_processor.php, method SvgProcessor::createThumb()
Used to create an svg thumbnail from a dom object
- createThumb
- in file image_processor.php, method ImageProcessor::createThumb()
Used to create a thumbnail from an image object
- CronModel
- in file cron_model.php, class CronModel
Used to remember the last time the web app ran periodic activities
- CRON_INTERVAL
- in file fetch_controller.php, class constant FetchController::CRON_INTERVAL
Number of seconds that must elapse after last call before doing
- CSRF_TOKEN
- in file config.php, constant CSRF_TOKEN
- currentDocsWithWord
- in file index_bundle_iterator.php, method IndexBundleIterator::currentDocsWithWord()
Gets the current block of doc ids and score associated with the this iterators word
- currentGenDocOffsetWithWord
- in file group_iterator.php, method GroupIterator::currentGenDocOffsetWithWord()
Gets the doc_offset and generation for the next document that would be return by this iterator
- currentGenDocOffsetWithWord
- in file intersect_iterator.php, method IntersectIterator::currentGenDocOffsetWithWord()
Gets the doc_offset and generation for the next document that would be return by this iterator
- currentGenDocOffsetWithWord
- in file word_iterator.php, method WordIterator::currentGenDocOffsetWithWord()
Gets the doc_offset and generation for the next document that would be return by this iterator
- currentGenDocOffsetWithWord
- in file negation_iterator.php, method NegationIterator::currentGenDocOffsetWithWord()
Gets the doc_offset and generation for the next document that would be return by this iterator
- currentGenDocOffsetWithWord
- in file network_iterator.php, method NetworkIterator::currentGenDocOffsetWithWord()
Gets the doc_offset and generation for the next document that
- currentGenDocOffsetWithWord
- in file index_bundle_iterator.php, method IndexBundleIterator::currentGenDocOffsetWithWord()
Gets the doc_offset and generation for the next document that would be return by this iterator
- currentGenDocOffsetWithWord
- in file doc_iterator.php, method DocIterator::currentGenDocOffsetWithWord()
Gets the doc_offset and generation for the next document that would be return by this iterator
- currentGenDocOffsetWithWord
- in file union_iterator.php, method UnionIterator::currentGenDocOffsetWithWord()
This method is supposed to get the doc_offset and generation
- currentObjects
- in file web_archive.php, method WebArchive::currentObjects()
Returns $num many objects from the web archive starting at the current iterator position, leaving the iterator position unchanged
- CURRENT_SERVER
- in file crawl_constants.php, class constant CrawlConstants::CURRENT_SERVER
- compressor.php
- procedural page compressor.php
- crawl_constants.php
- procedural page crawl_constants.php
- crawl_daemon.php
- procedural page crawl_daemon.php
- crawl_model.php
- procedural page crawl_model.php
- cron_model.php
- procedural page cron_model.php
- crawlstatus_view.php
- procedural page crawlstatus_view.php
- configure_element.php
- procedural page configure_element.php
- crawloptions_element.php
- procedural page crawloptions_element.php
top
d
- $data_size
- in file string_array.php, variable StringArray::$data_size
Size of each item in bytes to be stored
- $db
- in file database_bundle_iterator.php, variable DatabaseBundleIterator::$db
File handle for current arc file
- $db
- in file model.php, variable Model::$db
Reference to a DatasourceManager
- $db
- in file indexing_plugin.php, variable IndexingPlugin::$db
Reference to a database object that might be used by models on this
- $db
- in file fetcher.php, variable Fetcher::$db
Reference to a database object. Used since has directory manipulation
- $db
- in file mirror.php, variable Mirror::$db
Reference to a database object. Used since has directory manipulation
- $db
- in file queue_server.php, variable QueueServer::$db
Reference to a database object. Used since has directory manipulation
- $db
- in file web_queue_bundle_test.php, variable WebQueueBundleTest::$db
our dbms manager handle so we can call unlinkRecursive
- $dbhandle
- in file sqlite_manager.php, variable SqliteManager::$dbhandle
Stores the current Sqlite DB resource
- $dbhandle
- in file sqlite3_manager.php, variable Sqlite3Manager::$dbhandle
Stores the current Sqlite3 DB object
- $dbname
- in file sqlite3_manager.php, variable Sqlite3Manager::$dbname
Filename of the Sqlite3 Database
- $dbname
- in file sqlite_manager.php, variable SqliteManager::$dbname
Filename of the Sqlite Database
- $db_name
- in file model.php, variable Model::$db_name
Name of the search engine database
- $default_configure
- in file locale_model.php, variable LocaleModel::$default_configure
Used to store ini file data of the default locale (will use if no
- $deleted
- in file hash_table.php, variable HashTable::$deleted
Holds \0\0 followed by an all \FF string of length $this->key_size -1 Used to indicate that a slot once held data but that data was deleted.
- $description
- in file index_archive_bundle.php, variable IndexArchiveBundle::$description
A short text name for this IndexArchiveBundle
- $description
- in file web_archive_bundle.php, variable WebArchiveBundle::$description
A short text name for this WebArchiveBundle
- $dictionary
- in file index_archive_bundle.php, variable IndexArchiveBundle::$dictionary
IndexDictionary for all shards in the IndexArchiveBundle
- $dictionary_info
- in file word_iterator.php, variable WordIterator::$dictionary_info
An array of shard generation and posting list offsets, lengths, and
- $dir_name
- in file file_cache.php, variable FileCache::$dir_name
Folder name to use for this FileCache
- $dir_name
- in file web_queue_bundle.php, variable WebQueueBundle::$dir_name
The folder name of this WebQueueBundle
- $dir_name
- in file searchfilters_model.php, variable SearchfiltersModel::$dir_name
Directory in which to put filter
- $dir_name
- in file web_archive_bundle.php, variable WebArchiveBundle::$dir_name
Folder name to use for this WebArchiveBundle
- $dir_name
- in file bloom_filter_bundle.php, variable BloomFilterBundle::$dir_name
The folder name of this filter bundle
- $dir_name
- in file index_archive_bundle.php, variable IndexArchiveBundle::$dir_name
Folder name to use for this IndexArchiveBundle
- $dir_name
- in file index_dictionary.php, variable IndexDictionary::$dir_name
Folder name to use for this IndexDictionary
- $disallowed_sites
- in file fetcher.php, variable Fetcher::$disallowed_sites
Web-sites that the crawler must not crawl
- $disallowed_sites
- in file queue_server.php, variable QueueServer::$disallowed_sites
Web-sites that the crawler must not crawl
- $dns_table
- in file web_queue_bundle.php, variable WebQueueBundle::$dns_table
host-ip table used for dns look-up, comes from robot.txt data and
- $docids_len
- in file index_shard.php, variable IndexShard::$docids_len
Length of $doc_infos as a string
- $doc_infos
- in file index_shard.php, variable IndexShard::$doc_infos
Stores document id's and links to documents id's together with
- $domain_factors
- in file group_iterator.php, variable GroupIterator::$domain_factors
Used to keep track and to weight pages based on the number of other
- DATA
- in file crawl_constants.php, class constant CrawlConstants::DATA
- DatabaseBundleIterator
- in file database_bundle_iterator.php, class DatabaseBundleIterator
Used to iterate through the records that result from an SQL query to a database
- DatasourceManager
- in file datasource_manager.php, class DatasourceManager
This abstract class defines the interface through which the seek_quarry program communicates with a database and the filesystem.
- data_base_name
- in file crawl_constants.php, class constant CrawlConstants::data_base_name
- DBMS
- in file config.php, constant DBMS
- DB_HOST
- in file config.php, constant DB_HOST
- DB_NAME
- in file config.php, constant DB_NAME
- DB_PASSWORD
- in file config.php, constant DB_PASSWORD
- DB_USER
- in file config.php, constant DB_USER
- debugDisplay
- in file configure_tool.php, method ConfigureTool::debugDisplay()
Used to configure debugging information for this Yioop instance.
- DEBUG_LEVEL
- in file config.php, constant DEBUG_LEVEL
- decodeModified9
- in file utility.php, function decodeModified9()
Decoded a sequence of positive integers from a string that has been encoded using Modified 9
- deDeltaList
- in file utility.php, function deDeltaList()
Given an array of differences of integers reconstructs the original list. This computes the inverse of the deltaList function
- defaultLocale
- in file configure_tool.php, method ConfigureTool::defaultLocale()
Changes the default locale (language) used by Yioop when it cannot
- DEFAULT_DESCRIPTION_LENGTH
- in file model.php, class constant Model::DEFAULT_DESCRIPTION_LENGTH
Default maximum character length of a search summary
- default_filter_size
- in file bloom_filter_bundle.php, class constant BloomFilterBundle::default_filter_size
The default maximum size of a filter in a filter bundle
- DEFAULT_LOCALE
- in file config.php, constant DEFAULT_LOCALE
- DEFAULT_POST_MAX_SIZE
- in file fetcher.php, class constant Fetcher::DEFAULT_POST_MAX_SIZE
Before receiving any data from a queue server's web app this is
- DEFAULT_SAVE_FREQUENCY
- in file persistent_structure.php, class constant PersistentStructure::DEFAULT_SAVE_FREQUENCY
If not specified in the constructor, this will be the number of
- delete
- in file hash_table.php, method HashTable::delete()
Deletes the data associated with the provided key from the hash table
- deleteActivityRole
- in file role_model.php, method RoleModel::deleteActivityRole()
Remove an allowed activity from a role
- deleteCrawl
- in file crawl_controller.php, method CrawlController::deleteCrawl()
Receives a request to delete a crawl from a remote name server
- deleteCrawl
- in file crawl_model.php, method CrawlModel::deleteCrawl()
Deletes the crawl with the supplied timestamp if it exists. Also deletes any crawl mixes making use of this crawl
- deleteCrawlMix
- in file crawl_model.php, method CrawlModel::deleteCrawlMix()
Stores in DB the supplied crawl mix object
- deleteFeedItems
- in file source_model.php, method SourceModel::deleteFeedItems()
Copies all feeds items newer than $age to a new shard, then deletes
- deleteFileOrDir
- in file utility.php, function deleteFileOrDir()
This is a callback function used in the process of recursively deleting a directory
- deleteHashTable
- in file web_queue_bundle.php, method WebQueueBundle::deleteHashTable()
Removes an entries from the to crawl hash table
- deleteLocale
- in file locale_model.php, method LocaleModel::deleteLocale()
Remove a locale from the database
- deleteMachine
- in file machine_model.php, method MachineModel::deleteMachine()
Delete a machine by its name
- deleteMediaSource
- in file source_model.php, method SourceModel::deleteMediaSource()
Deletes the media source whose id is the given timestamp
- deleteOldCrawls
- in file fetcher.php, method Fetcher::deleteOldCrawls()
Deletes any crawl web archive bundles not in the provided array of crawls
- deleteOrphanedBundles
- in file queue_server.php, method QueueServer::deleteOrphanedBundles()
Delete all the queue bundles and schedules that don't have an associated index bundle as this means that crawl has been deleted.
- deleteRobotData
- in file queue_server.php, method QueueServer::deleteRobotData()
Deletes all Robot informations stored by the QueueServer.
- deleteRole
- in file role_model.php, method RoleModel::deleteRole()
Delete a role by its roleid
- deleteSeenUrls
- in file queue_server.php, method QueueServer::deleteSeenUrls()
Removes the already seen urls from the supplied array
- deleteSubsearch
- in file source_model.php, method SourceModel::deleteSubsearch()
Deletes a subsearch from the subsearch table and removes its associated translations
- deleteUser
- in file user_model.php, method UserModel::deleteUser()
Deletes a user by username from the list of users that can login to the admin panel
- deleteUserRole
- in file user_model.php, method UserModel::deleteUserRole()
Deletes a role from a given user
- deltaList
- in file utility.php, function deltaList()
Computes the difference of a list of integers.
- dequeue
- in file recipe_plugin.php, method Queue::dequeue()
- description
- in file html_processor.php, method HtmlProcessor::description()
Returns descriptive text concerning a webpage based on its document object
- description
- in file pptx_processor.php, method PptxProcessor::description()
Returns descriptive text concerning a pptx slide based on its document object
- description
- in file rss_processor.php, method RssProcessor::description()
Returns descriptive text concerning a webpage based on its document object
- description
- in file xlsx_processor.php, method XlsxProcessor::description()
Returns descriptive text concerning a xlsx file based on its document object
- DESCRIPTION
- in file crawl_constants.php, class constant CrawlConstants::DESCRIPTION
- description
- in file svg_processor.php, method SvgProcessor::description()
Returns descriptive text concerning a svg page based on its document object
- descriptionTestCase
- in file xlsx_processor_test.php, method XlsxProcessorTest::descriptionTestCase()
Tests that the description is correct
- DESCRIPTION_LENGTH
- in file crawl_constants.php, class constant CrawlConstants::DESCRIPTION_LENGTH
- DESCRIPTION_WEIGHT
- in file config.php, constant DESCRIPTION_WEIGHT
BM25F weight for other text within doc
- DESCRIPTION_WORDS
- in file crawl_constants.php, class constant CrawlConstants::DESCRIPTION_WORDS
- DESCRIPTION_WORD_SCORE
- in file crawl_constants.php, class constant CrawlConstants::DESCRIPTION_WORD_SCORE
- DICT_BLOCK_POWER
- in file index_dictionary.php, class constant IndexDictionary::DICT_BLOCK_POWER
Disk block size is 1<< this power
- DICT_BLOCK_SIZE
- in file index_dictionary.php, class constant IndexDictionary::DICT_BLOCK_SIZE
Size in bytes of one block in IndexDictionary
- differenceFilter
- in file bloom_filter_bundle.php, method BloomFilterBundle::differenceFilter()
Removes from the passed array those elements $elt who either are in the filter bundle or whose $elt[$field_name] is in the bundle.
- differenceSeenUrls
- in file web_queue_bundle.php, method WebQueueBundle::differenceSeenUrls()
Removes all url objects from $url_array which have been seen
- disallowedToCrawlSite
- in file queue_server.php, method QueueServer::disallowedToCrawlSite()
Checks if url belongs to a list of sites that aren't supposed to be crawled
- disallowedToCrawlSiteTestCase
- in file queue_server_test.php, method QueueServerTest::disallowedToCrawlSiteTestCase()
disallowedToCrawlSite check if a url is matches a list of url
- DISALLOWED_SITES
- in file crawl_constants.php, class constant CrawlConstants::DISALLOWED_SITES
- disconnect
- in file sqlite3_manager.php, method Sqlite3Manager::disconnect()
- disconnect
- in file sqlite_manager.php, method SqliteManager::disconnect()
- disconnect
- in file datasource_manager.php, method DatasourceManager::disconnect()
Closes connections to DBMS
- disconnect
- in file pdo_manager.php, method PdoManager::disconnect()
- disconnect
- in file mysql_manager.php, method MysqlManager::disconnect()
- DisplayresultsHelper
- in file displayresults_helper.php, class DisplayresultsHelper
This is a helper class used to handle displaying description. If it has recipe data each ingredient is displayed in seperate line.
- displayView
- in file controller.php, method Controller::displayView()
Send the provided view to output, drawing it with the given data variable, using the current locale for translation, and writing mode
- DISPLAY_TESTS
- in file config.php, constant DISPLAY_TESTS
if true tests are diplayable
- dnsLookup
- in file web_queue_bundle.php, method WebQueueBundle::dnsLookup()
Add an entry to the web_queue_bundles DNS cache
- DNS_TIME
- in file crawl_constants.php, class constant CrawlConstants::DNS_TIME
- docIndexModified9
- in file utility.php, function docIndexModified9()
Given an int encoding encoding a doc_index followed by a position list using Modified 9, extracts just the doc_index.
- DocIterator
- in file doc_iterator.php, class DocIterator
Used to iterate through all the documents and links associated with a an IndexArchiveBundle. It iterates through each doc or link regarless of the words it contains. It also makes it easy to get the summaries of these documents.
- docOffsetFromPostingOffset
- in file index_shard.php, method IndexShard::docOffsetFromPostingOffset()
Given an offset of a posting into the word_docs string, looks up the posting there and computes the doc_offset stored in it.
- DocProcessor
- in file doc_processor.php, class DocProcessor
Used to create crawl summary information for binary DOC files
- doCronTasks
- in file fetch_controller.php, method FetchController::doCronTasks()
Used to do periodic maintenance tasks for the Name Server.
- docStats
- in file index_shard.php, method IndexShard::docStats()
Computes BM25F relevance and a score for the supplied item based on the supplied parameters.
- DOC_DEPTH
- in file crawl_constants.php, class constant CrawlConstants::DOC_DEPTH
- DOC_ID
- in file crawl_constants.php, class constant CrawlConstants::DOC_ID
- DOC_INFO
- in file crawl_constants.php, class constant CrawlConstants::DOC_INFO
- DOC_KEY_LEN
- in file index_shard.php, class constant IndexShard::DOC_KEY_LEN
Length of a key in a DOC ID.
- DOC_LEN
- in file crawl_constants.php, class constant CrawlConstants::DOC_LEN
- DOC_RANK
- in file crawl_constants.php, class constant CrawlConstants::DOC_RANK
- dom
- in file xml_processor.php, method XmlProcessor::dom()
Return a document object based on a string containing the contents of an XML page
- dom
- in file html_processor.php, method HtmlProcessor::dom()
Return a document object based on a string containing the contents of a web page
- dom
- in file xlsx_processor.php, method XlsxProcessor::dom()
Return a document object based on a string containing the contents of a xml file
- dom
- in file sitemap_processor.php, method SitemapProcessor::dom()
Return a document object based on a string containing the contents of an RSS page
- dom
- in file rss_processor.php, method RssProcessor::dom()
Return a document object based on a string containing the contents of an RSS page
- dom
- in file svg_processor.php, method SvgProcessor::dom()
Return a document object based on a string containing the contents of an SVG page
- dom
- in file pptx_processor.php, method PptxProcessor::dom()
Return a document object based on a string containing the contents of a web page
- DOMAIN_WEIGHTS
- in file crawl_constants.php, class constant CrawlConstants::DOMAIN_WEIGHTS
- downloadPagesArchiveCrawl
- in file fetcher.php, method Fetcher::downloadPagesArchiveCrawl()
Extracts NUM_MULTI_CURL_PAGES from the curent Archive Bundle that is being recrawled.
- downloadPagesWebCrawl
- in file fetcher.php, method Fetcher::downloadPagesWebCrawl()
Get a list of urls from the current fetch batch provided by the queue server. Then downloads these pages. Finally, reschedules, if possible, pages that did not successfully get downloaded.
- DOWNLOAD_ERROR_THRESHOLD
- in file config.php, constant DOWNLOAD_ERROR_THRESHOLD
Number of error page 400 or greater seen from a host before crawl-delay
- DOWNLOAD_RANGE
- in file mirror.php, class constant Mirror::DOWNLOAD_RANGE
Maximum number of bytes from a file to download in one go
- DOWNLOAD_SIZE_INTERVAL
- in file config.php, constant DOWNLOAD_SIZE_INTERVAL
Used to say number of bytes in histogram bar for file download sizes
- DOWNLOAD_TIME_INTERVAL
- in file config.php, constant DOWNLOAD_TIME_INTERVAL
Used to say number of secs in histogram bar for file download times
- drawChooseItems
- in file configure_tool.php, method ConfigureTool::drawChooseItems()
Draws a list of options to the screen and gets a choice from this list from the user.
- DUMMY
- in file crawl_constants.php, class constant CrawlConstants::DUMMY
- dumpQueueToSchedules
- in file queue_server.php, method QueueServer::dumpQueueToSchedules()
When a crawl is being shutdown, this function is called to write the contents of the web queue bundle back to schedules. This allows crawls to be resumed without losing urls.
- database_bundle_iterator.php
- procedural page database_bundle_iterator.php
- doc_iterator.php
- procedural page doc_iterator.php
- doc_processor.php
- procedural page doc_processor.php
- datasource_manager.php
- procedural page datasource_manager.php
- displayresults_helper.php
- procedural page displayresults_helper.php
top
e
- $editedPageSummaries
- in file model.php, variable Model::$editedPageSummaries
Associative array of page summaries which might be used to override default page summaries if set.
- $elements
- in file search_view.php, variable SearchView::$elements
Names of element objects that the view uses to display itself
- $elements
- in file settings_view.php, variable SettingsView::$elements
Names of element objects that the view uses to display itself
- $elements
- in file statistics_view.php, variable StatisticsView::$elements
Names of element objects that the view uses to display itself
- $elements
- in file view.php, variable View::$elements
Names of element objects that the view uses to display itself
- $elements
- in file static_view.php, variable StaticView::$elements
Names of element objects that the view uses to display itself
- $elements
- in file nocache_view.php, variable NocacheView::$elements
Names of element objects that the view uses to display itself
- $elements
- in file admin_view.php, variable AdminView::$elements
Names of element objects that the view uses to display itself
- $empty
- in file word_iterator.php, variable WordIterator::$empty
Keeps track of whether the word_iterator list is empty because the
- $encoding
- in file database_bundle_iterator.php, variable DatabaseBundleIterator::$encoding
What character encoding is used for the DB records
- $end_delimiter
- in file text_archive_bundle_iterator.php, variable TextArchiveBundleIterator::$end_delimiter
Ending delimiters for records
- $end_marker
- in file trie.php, variable Trie::$end_marker
The marker used to represent the end of an entry in a trie
- $end_of_iterator
- in file archive_bundle_iterator.php, variable ArchiveBundleIterator::$end_of_iterator
Whether or not the iterator still has more documents
- $extensions
- in file locale_model.php, variable LocaleModel::$extensions
File extensions of files to try to extract translatable strings from
- $extract_dirs
- in file locale_model.php, variable LocaleModel::$extract_dirs
Directories to try to extract translatable identifier strings from
- e
- in file utility.php, function e()
shorthand for echo
- Edge
- in file recipe_plugin.php, class Edge
class to define edge
- editCrawlOption
- in file admin_controller.php, method AdminController::editCrawlOption()
Called from @see manageCrawls to edit the parameters for the next crawl (or current crawl) to be carried out by the machines $machine_urls. Updates $data array to be supplied to AdminView
- EditlocalesElement
- in file editlocales_element.php, class EditlocalesElement
Element responsible for displaying the form where users can input string translations for a given locale
- editMix
- in file admin_controller.php, method AdminController::editMix()
Handles admin request related to the editing a crawl mix activity
- EditmixElement
- in file editmix_element.php, class EditmixElement
Element responsible for displaying info about a given crawl mix
- EditstaticElement
- in file editstatic_element.php, class EditstaticElement
Element responsible for drawing the screen used to set up the search engine
- Element
- in file element.php, class Element
Base Element Class.
- emptyDNSCache
- in file web_queue_bundle.php, method WebQueueBundle::emptyDNSCache()
Delete the Hash table used to store DNS lookup info.
- emptyRobotData
- in file web_queue_bundle.php, method WebQueueBundle::emptyRobotData()
Delete the Bloom filters used to store robots.txt file info.
- emptyUrlFilter
- in file web_queue_bundle.php, method WebQueueBundle::emptyUrlFilter()
Empty the crawled url filter for this web queue bundle; resets the the timestamp of the last time this filter was emptied.
- encodeModified9
- in file utility.php, function encodeModified9()
Encodes a sequence of integers x, such that 1 <= x <= 2<<28-1 as a string.
- ENCODING
- in file crawl_constants.php, class constant CrawlConstants::ENCODING
- endMatch
- in file phrase_model.php, method PhraseModel::endMatch()
Matches terms (non white-char strings) in the language $lang_tag in
- END_ITERATOR
- in file crawl_constants.php, class constant CrawlConstants::END_ITERATOR
- enqueue
- in file recipe_plugin.php, method Queue::enqueue()
- EnStemmer
- in file tokenizer.php, class EnStemmer
My stab at implementing the Porter Stemmer algorithm presented http://tartarus.org/~martin/PorterStemmer/def.txt The code is based on the non-thread safe C version given by Martin Porter.
- EN_RATIO
- in file config.php, constant EN_RATIO
Percentage ASCII text before guess we dealing with english
- eof
- in file bzip2_block_iterator.php, method BZip2BlockIterator::eof()
Checks whether the current Bzip2 file has reached an end of file
- EpubProcessor
- in file epub_processor.php, class EpubProcessor
Used to create crawl summary information for XML files (those served as application/epub+zip)
- EpubProcessorTest
- in file epub_processor_test.php, class EpubProcessorTest
UnitTest for the EpubProcessor class. An EpubProcessor is used to process a .epub (ebook publishing standard) file and extract summary from it. This class tests the processing of an .epub file format by EpubProcessor.
- error.php
- procedural page error.php
- ERROR_CRAWL_DELAY
- in file config.php, constant ERROR_CRAWL_DELAY
Crawl-delay to set in the event that DOWNLOAD_ERROR_THRESHOLD exceeded
- ERROR_INFO
- in file config.php, constant ERROR_INFO
bit of DEBUG_LEVEL used to indicate php messages should be displayed
- escapeString
- in file datasource_manager.php, method DatasourceManager::escapeString()
Used to escape strings before insertion in the database to avoid SQL injection
- escapeString
- in file sqlite_manager.php, method SqliteManager::escapeString()
- escapeString
- in file pdo_manager.php, method PdoManager::escapeString()
- escapeString
- in file sqlite3_manager.php, method Sqlite3Manager::escapeString()
- escapeString
- in file mysql_manager.php, method MysqlManager::escapeString()
- exceedMemoryThreshold
- in file fetcher.php, method Fetcher::exceedMemoryThreshold()
Function to check if memory for this fetcher instance is getting low relative to what the system will allow.
- excludedPath
- in file code_tool.php, function excludedPath()
Checks if $path is amongst a list of paths which should be ignored
- exec
- in file pdo_manager.php, method PdoManager::exec()
- exec
- in file sqlite_manager.php, method SqliteManager::exec()
- exec
- in file mysql_manager.php, method MysqlManager::exec()
- exec
- in file datasource_manager.php, method DatasourceManager::exec()
Hook Method for execute(). Executes the sql command on the database
- exec
- in file sqlite3_manager.php, method Sqlite3Manager::exec()
- execMachines
- in file parallel_model.php, method ParallelModel::execMachines()
This method is invoked by other ParallelModel (@see CrawlModel
- execute
- in file datasource_manager.php, method DatasourceManager::execute()
Executes the supplied sql command on the database, depending on debug levels computes query statistics
- executeAssignmentRule
- in file page_rule_parser.php, method PageRuleParser::executeAssignmentRule()
Used to execute a single assignment rule on $page_data
- executeFunctionRule
- in file page_rule_parser.php, method PageRuleParser::executeFunctionRule()
Used to execute a single command rule on $page_data
- executeRuleTrees
- in file page_rule_parser.php, method PageRuleParser::executeRuleTrees()
Executes either the internal $rule_trees or the passed $rule_trees on the provided $page_data associative array
- exists
- in file trie.php, method Trie::exists()
Returns the sub trie_array under $term in $this->trie_array. If $term does not exist in $trie->trie_array returns false
- existsTestCase
- in file trie_test.php, method TrieTest::existsTestCase()
Check if we look up something in our Trie, that correct subtree
- extractActivityQuery
- in file search_controller.php, method SearchController::extractActivityQuery()
This method is responsible for parsing out the kind of query from the raw query string
- extractASCIIText
- in file doc_processor.php, method DocProcessor::extractASCIIText()
This is the main text from Word doc extractor A Word Doc consists of a FIB, Piece Table, and DocumentStream. The last contains the text.
- extractHttpHttpsUrls
- in file text_processor.php, method TextProcessor::extractHttpHttpsUrls()
Tries to extract http or https links from a string of text.
- extractMergeLocales
- in file locale_model.php, method LocaleModel::extractMergeLocales()
Used to extract identifier strings from files with correct extensions, then these strings are merged with existing extracted strings for each locale as well as their translations (if an extract string has a translation the translation is untouched by this process).
- extractPhrases
- in file phrase_parser.php, method PhraseParser::extractPhrases()
Extracts all phrases (sequences of adjacent words) from $string. Does not extract terms within those phrase. Array key indicates position of phrase
- extractPhrasesAndCount
- in file phrase_parser.php, method PhraseParser::extractPhrasesAndCount()
Extracts all phrases (sequences of adjacent words) from $string. Does not extract terms within those phrase. Returns an associative array of phrase => number of occurrences of phrase
- extractPhrasesInLists
- in file phrase_parser.php, method PhraseParser::extractPhrasesInLists()
Extracts all phrases (sequences of adjacent words) from $string. Does extract terms within those phrase.
- extractPhrasesTestCase
- in file phrase_parser_test.php, method PhraseParserTest::extractPhrasesTestCase()
Tests the ability of extractPhrasesInLists to extract some hard-case
- extractPrefixRecord
- in file index_dictionary.php, method IndexDictionary::extractPrefixRecord()
Returns the $record_num'th prefix record from $prefix_string
- extractTermsAndFilterPhrases
- in file phrase_parser.php, method PhraseParser::extractTermsAndFilterPhrases()
Splits string according to punctuation and white space then extracts (stems/char grams) of terms and n word grams from the string
- extractText
- in file rtf_processor.php, method RtfProcessor::extractText()
Gets plain text out of an rtf string
- extractWordStringPageSummary
- in file phrase_parser.php, method PhraseParser::extractWordStringPageSummary()
Converts a summary of a web page into a string of space separated words
- epub_processor.php
- procedural page epub_processor.php
- epub_processor_test.php
- procedural page epub_processor_test.php
- editlocales_element.php
- procedural page editlocales_element.php
- editmix_element.php
- procedural page editmix_element.php
- editstatic_element.php
- procedural page editstatic_element.php
- element.php
- procedural page element.php
top
f
- $fd
- in file bzip2_block_iterator.php, variable BZip2BlockIterator::$fd
File handle for bz2 file
- $feed_info
- in file word_iterator.php, variable WordIterator::$feed_info
- $feed_shard_name
- in file word_iterator.php, variable WordIterator::$feed_shard_name
- $fetcher_num
- in file fetcher.php, variable Fetcher::$fetcher_num
Which fetcher instance we are (if fetcher run as a job and more that one)
- $fetcher_prefix
- in file web_archive_bundle_iterator.php, variable WebArchiveBundleIterator::$fetcher_prefix
The fetcher prefix associated with this archive.
- $fh
- in file index_shard.php, variable IndexShard::$fh
File handle for a shard if we are going to use it in read mode and not completely load it.
- $fh
- in file text_archive_bundle_iterator.php, variable TextArchiveBundleIterator::$fh
File handle for current archive file
- $fhs
- in file index_dictionary.php, variable IndexDictionary::$fhs
Array of file handle for files in the dictionary. Members are used to read files to look up words.
- $field_value_separator
- in file database_bundle_iterator.php, variable DatabaseBundleIterator::$field_value_separator
For a given DB record each column is converted to a string:
- $filename
- in file persistent_structure.php, variable PersistentStructure::$filename
Name of the file in which to store the PersistentStructure
- $filename
- in file web_archive.php, variable WebArchive::$filename
Filename used to store the web archive.
- $file_len
- in file index_shard.php, variable IndexShard::$file_len
Keeps track of the length of the shard as a file
- $file_lens
- in file index_dictionary.php, variable IndexDictionary::$file_lens
Array of file lengths for files in the dictionary. Use so don't try to seek past end of files
- $file_offset
- in file bzip2_block_iterator.php, variable BZip2BlockIterator::$file_offset
Byte offset into bz2 file
- $filter
- in file network_iterator.php, variable NetworkIterator::$filter
Keeps track of whether the word_iterator list is empty becuase the
- $filter
- in file bloom_filter_file.php, variable BloomFilterFile::$filter
Packed string used to store the Bloom filters
- $filter
- in file doc_iterator.php, variable DocIterator::$filter
Used to keep track of docs to filter out of results
- $filter
- in file word_iterator.php, variable WordIterator::$filter
Keeps track of whether the word_iterator list is empty because the
- $filter_size
- in file web_queue_bundle.php, variable WebQueueBundle::$filter_size
Number items that can be stored in a partition of the page exists filter
- $filter_size
- in file bloom_filter_bundle.php, variable BloomFilterBundle::$filter_size
The maximum capacity of a filter in this filter bundle
- $filter_size
- in file bloom_filter_file.php, variable BloomFilterFile::$filter_size
Size in bits of the packed string array used to store the filter's
- $found_sites
- in file fetcher.php, variable Fetcher::$found_sites
Summary information for visited sites that the fetcher hasn't sent to
- fetcher.php
- procedural page fetcher.php
- fetch_controller.php
- procedural page fetch_controller.php
- FALLBACK_LOCALE_DIR
- in file config.php, constant FALLBACK_LOCALE_DIR
- FeedsHelper
- in file feeds_helper.php, class FeedsHelper
Helper used to draw links and snippets for RSS feeds
- fetchArray
- in file pdo_manager.php, method PdoManager::fetchArray()
- fetchArray
- in file sqlite_manager.php, method SqliteManager::fetchArray()
- fetchArray
- in file mysql_manager.php, method MysqlManager::fetchArray()
- fetchArray
- in file sqlite3_manager.php, method Sqlite3Manager::fetchArray()
- fetchArray
- in file datasource_manager.php, method DatasourceManager::fetchArray()
Returns the next row from the provided result set
- FetchController
- in file fetch_controller.php, class FetchController
This class handles data coming to a queue_server from a fetcher Basically, it receives the data from the fetcher and saves it into various files for later processing by the queue server.
- Fetcher
- in file fetcher.php, class Fetcher
This class is responsible for fetching web pages for the SeekQuarry/Yioop search engine
- FetchUrl
- in file fetch_url.php, class FetchUrl
Code used to manage HTTP requests from one or more URLS
- FetchView
- in file fetch_view.php, class FetchView
This view is displayed by the fetch_controller.php to send information to a fetcher about things like what to crawl next
- fetch_archive_iterator
- in file crawl_constants.php, class constant CrawlConstants::fetch_archive_iterator
- fetch_batch_name
- in file crawl_constants.php, class constant CrawlConstants::fetch_batch_name
- fetch_closed_name
- in file crawl_constants.php, class constant CrawlConstants::fetch_closed_name
- fetch_crawl_info
- in file crawl_constants.php, class constant CrawlConstants::fetch_crawl_info
- FETCH_SLEEP_TIME
- in file config.php, constant FETCH_SLEEP_TIME
an idling fetcher sleeps this long between queue_server pings
- FileCache
- in file file_cache.php, class FileCache
Library of functions used to implement a simple file cache This might be used on systems that don't have memcache
- fileClose
- in file text_archive_bundle_iterator.php, method TextArchiveBundleIterator::fileClose()
Wrapper around particular compression scheme fclose function
- fileExtension
- in file non_compressor.php, method NonCompressor::fileExtension()
File extension that should be associated with this compressor
- fileExtension
- in file gzip_compressor.php, method GzipCompressor::fileExtension()
File extension that should be associated with this compressor
- fileExtension
- in file compressor.php, method Compressor::fileExtension()
File extension that should be associated with this compressor
- fileGets
- in file text_archive_bundle_iterator.php, method TextArchiveBundleIterator::fileGets()
Acts as gzgets(), hiding the fact that buffering of the archive_file is being done to a buffer file
- fileInfo
- in file utility.php, function fileInfo()
This is a callback function used in the process of recursively calculating an array of file modification times and files sizes for a directorys
- fileInfoRecursive
- in file datasource_manager.php, method DatasourceManager::fileInfoRecursive()
Returns arrays of filesizes and file modifcations times of files in
- fileOpen
- in file text_archive_bundle_iterator.php, method TextArchiveBundleIterator::fileOpen()
Wrapper around particular compression scheme fopen function
- fileRead
- in file text_archive_bundle_iterator.php, method TextArchiveBundleIterator::fileRead()
Acts as gzread($num_bytes, $archive_file), hiding the fact that buffering of the archive_file is being done to a buffer file
- fileTell
- in file text_archive_bundle_iterator.php, method TextArchiveBundleIterator::fileTell()
Returns the current position in the current iterator partition file for the given compression scheme.
- FILETYPE
- in file crawl_constants.php, class constant CrawlConstants::FILETYPE
- FiletypeHelper
- in file filetype_helper.php, class FiletypeHelper
This is a helper class is used to handle
- fileWithTrim
- in file token_tool.php, function fileWithTrim()
Reads file into an array or outputs file not found. For each entry in array trims it. Any blank lines are deleted
- FILTER_SUFFIX
- in file nword_grams.php, class constant NWordGrams::FILTER_SUFFIX
Suffix appended to language tag to create the filter file name containing bigrams.
- findCommonIngredient
- in file recipe_plugin.php, method Tree::findCommonIngredient()
Finds the common ingredient for each of the clusters.
- findDocsWithWord
- in file index_bundle_iterator.php, method IndexBundleIterator::findDocsWithWord()
Hook function used by currentDocsWithWord to return the current block of docs if it is not cached
- findDocsWithWord
- in file negation_iterator.php, method NegationIterator::findDocsWithWord()
Hook function used by currentDocsWithWord to return the current block of docs if it is not cached
- findDocsWithWord
- in file doc_iterator.php, method DocIterator::findDocsWithWord()
Hook function used by currentDocsWithWord to return the current block of docs if it is not cached
- findDocsWithWord
- in file union_iterator.php, method UnionIterator::findDocsWithWord()
Hook function used by currentDocsWithWord to return the current block of docs if it is not cached
- findDocsWithWord
- in file intersect_iterator.php, method IntersectIterator::findDocsWithWord()
Hook function used by currentDocsWithWord to return the current block of docs if it is not cached
- findDocsWithWord
- in file group_iterator.php, method GroupIterator::findDocsWithWord()
Hook function used by currentDocsWithWord to return the current block of docs if it is not cached
- findDocsWithWord
- in file word_iterator.php, method WordIterator::findDocsWithWord()
Hook function used by currentDocsWithWord to return the current block of docs if it is not cached
- findDocsWithWord
- in file network_iterator.php, method NetworkIterator::findDocsWithWord()
Hook function used by currentDocsWithWord to return the current block of docs if it is not cached
- FIRST_CHAR_TEXT_SEG
- in file ppt_processor.php, class constant PptProcessor::FIRST_CHAR_TEXT_SEG
- fixLinksCallback
- in file mediawiki_bundle_iterator.php, function fixLinksCallback()
Used to changes spaces to underscores in links generated from our earlier matching rules
- FIX_NAME_SERVER
- in file config.php, constant FIX_NAME_SERVER
- FLATTEN_FREQUENCY
- in file index_shard.php, class constant IndexShard::FLATTEN_FREQUENCY
Fraction of NUM_DOCS_PER_GENERATION document inserts before data
- FooterElement
- in file footer_element.php, class FooterElement
Element responsible for drawing footer links on search view and static view pages
- forceSave
- in file index_archive_bundle.php, method IndexArchiveBundle::forceSave()
Forces the current shard to be saved
- forceSave
- in file bloom_filter_bundle.php, method BloomFilterBundle::forceSave()
Used to save to disk all the file data associated with this bundle
- FORCE_SAVE_TIME
- in file config.php, constant FORCE_SAVE_TIME
Max time before dirty index (queue_server) and
- formatCachePage
- in file search_controller.php, method SearchController::formatCachePage()
Formats a cache of a web page (adds history ui and highlight keywords)
- formatDateByLocale
- in file locale_functions.php, function formatDateByLocale()
Function for formatting a date string based on the locale.
- formatPageResults
- in file model.php, method Model::formatPageResults()
Given an array page summarries, for each summary extracts snippets which are related to a set of search words. For each snippet, bold faces the search terms, and then creates a new summary array.
- formCluster
- in file recipe_plugin.php, method Tree::formCluster()
forms the clusters by removing maximum weighted edges.
- fetch_url.php
- procedural page fetch_url.php
- file_cache.php
- procedural page file_cache.php
- footer_element.php
- procedural page footer_element.php
- fetch_view.php
- procedural page fetch_view.php
- feeds_helper.php
- procedural page feeds_helper.php
- filetype_helper.php
- procedural page filetype_helper.php
top
g
- $generation
- in file index_shard.php, variable IndexShard::$generation
This is supposed to hold the number of earlier shards, prior to the current shard.
- $generation_info
- in file index_archive_bundle.php, variable IndexArchiveBundle::$generation_info
structure contains info about the current generation: its index (ACTIVE), and the number of words it contains (NUM_WORDS).
- $generation_pointer
- in file word_iterator.php, variable WordIterator::$generation_pointer
Index into dictionary_info corresponding to the current shard
- $got_robottxt_filter
- in file web_queue_bundle.php, variable WebQueueBundle::$got_robottxt_filter
BloomFilter used to store which hosts whose robots.txt file we
- $grouped_hashes
- in file group_iterator.php, variable GroupIterator::$grouped_hashes
hashed of document web pages used to keep track of track of
- $grouped_keys
- in file group_iterator.php, variable GroupIterator::$grouped_keys
hashed url keys used to keep track of track of groups seen so far
- genDocOffsetCmp
- in file index_bundle_iterator.php, method IndexBundleIterator::genDocOffsetCmp()
Compares two arrays each containing a (generation, offset) pair.
- general_is_a
- in file utility.php, function general_is_a()
Checks if class_1 is the same as class_2 of has class_2 as a parent Behaves like 3 param version (last param true) of PHP is_a function that came into being with Version 5.3.9.
- generateCSRFToken
- in file controller.php, method Controller::generateCSRFToken()
Generates a cross site request forgery preventing token based on the provided user name, the current time and the hidden AUTH_KEY
- GENERATION
- in file crawl_constants.php, class constant CrawlConstants::GENERATION
- get
- in file resource_controller.php, method ResourceController::get()
Gets the resource $_REQUEST['n'] from APP_DIR/$_REQUEST['f'] or
- get
- in file analytics_manager.php, method AnalyticsManager::get()
Used to get the timing statistic associated with $attribute
- get
- in file file_cache.php, method FileCache::get()
Retrieve data associated with a key that has been put in the cache
- get
- in file string_array.php, method StringArray::get()
Looks up the ith item in the StringArray
- getActiveShard
- in file index_archive_bundle.php, method IndexArchiveBundle::getActiveShard()
Sets the current shard to be the active shard (the active shard is
- getActivityList
- in file activity_model.php, method ActivityModel::getActivityList()
Gets a list of activity ids, method names, and translated name of each available activity
- getActivityNameFromMethodName
- in file activity_model.php, method ActivityModel::getActivityNameFromMethodName()
Given the method name of a method to perform an activity return the translated activity name
- getAdditionalMetaWords
- in file indexing_plugin.php, method IndexingPlugin::getAdditionalMetaWords()
Returns an associative array of meta words => description length
- getAdditionalMetaWords
- in file recipe_plugin.php, method RecipePlugin::getAdditionalMetaWords()
Returns an array of additional meta words which have been added by this plugin
- getArchiveInfo
- in file web_archive_bundle.php, method WebArchiveBundle::getArchiveInfo()
Gets information about a WebArchiveBundle out of its description.txt file
- getArchiveInfo
- in file index_archive_bundle.php, method IndexArchiveBundle::getArchiveInfo()
Gets the description, count of summaries, and number of partitions of the
- getArchiveKind
- in file arc_tool.php, method ArcTool::getArchiveKind()
Given a folder name, determines the kind of bundle (if any) it holds.
- getArchiveName
- in file web_archive_bundle_iterator.php, method WebArchiveBundleIterator::getArchiveName()
Returns the path to an archive given its timestamp.
- getArchiveName
- in file arc_tool.php, method ArcTool::getArchiveName()
Given a complete path to an archive returns its filename
- getArchiveName
- in file mix_archive_bundle_iterator.php, method MixArchiveBundleIterator::getArchiveName()
Get the filename of the file that says information about the current archive iterator (such as whether the end of the iterator has been reached)
- getAttributeValue
- in file odp_rdf_bundle_iterator.php, method OdpRdfArchiveBundleIterator::getAttributeValue()
Gets the value of the attribute $attribute of the first dom node satisfying the xpath expression $path in the dom document $dom
- getAttributeValueAll
- in file odp_rdf_bundle_iterator.php, method OdpRdfArchiveBundleIterator::getAttributeValueAll()
Gets the value of the attribute $attribute for each dom node satisfying the xpath expression $path in the dom document $dom
- getBetweenTags
- in file text_processor.php, method TextProcessor::getBetweenTags()
Gets the text between two tags in a document starting at the current position.
- getBit
- in file bloom_filter_file.php, method BloomFilterFile::getBit()
Looks up the value of the ith bit position in the filter
- getBlockProgression
- in file locale_functions.php, function getBlockProgression()
Returns the current locales method of writing blocks (things like divs or paragraphs).A language like English puts blocks one after another from the top of the page to the bottom. Other languages like classical Chinese list them from right to left.
- getBlockProgression
- in file locale_model.php, method LocaleModel::getBlockProgression()
The direction that blocks (such as p or div tags) should be drawn in the current locale
- getCacheFile
- in file crawl_model.php, method CrawlModel::getCacheFile()
Gets the cached version of a web page from the machine on which it was fetched.
- getCharGramsTerm
- in file phrase_parser.php, method PhraseParser::getCharGramsTerm()
Returns the characters n-grams for the given terms where n is the length Yioop uses for the language in question. If a stemmer is used for language then n-gramming is no done and this just returns an empty array
- getClassNameFromFileName
- in file index.php, function getClassNameFromFileName()
Convert the convention for unit test file names into our convention for unit test class names
- getCompanyLevelDomain
- in file fetcher.php, method Fetcher::getCompanyLevelDomain()
Calculates the company level domain for the given url
- getContents
- in file web_queue_bundle.php, method WebQueueBundle::getContents()
Gets the contents of the queue bundle as an array of ordered
- getContents
- in file priority_queue.php, method PriorityQueue::getContents()
Return the contents of the priority queue as an array of value weight pairs.
- getCost
- in file recipe_plugin.php, method Edge::getCost()
- getCrawlDelay
- in file web_queue_bundle.php, method WebQueueBundle::getCrawlDelay()
Gets the Crawl-delay of $host from the crawl delay bloom filter
- getCrawlItem
- in file parallel_model.php, method ParallelModel::getCrawlItem()
Get a summary of a document by the generation it is in and its offset into the corresponding WebArchive.
- getCrawlItems
- in file crawl_controller.php, method CrawlController::getCrawlItems()
Receives a request to get crawl summary data for an array of urls
- getCrawlItems
- in file parallel_model.php, method ParallelModel::getCrawlItems()
Gets summaries for a set of document by their url, or by group of 5-tuples of the form (machine, key, index, generation, offset).
- getCrawlItems
- in file search_controller.php, method SearchController::getCrawlItems()
Get crawl items based on queue server setting.
- getCrawlList
- in file crawl_controller.php, method CrawlController::getCrawlList()
Handles a request for the crawl list (what crawl are stored on the
- getCrawlList
- in file crawl_model.php, method CrawlModel::getCrawlList()
Gets a list of all index archives of crawls that have been conducted
- getCrawlMix
- in file crawl_model.php, method CrawlModel::getCrawlMix()
Retrieves the weighting component of the requested crawl mix
- getCrawlMixTimestamp
- in file crawl_model.php, method CrawlModel::getCrawlMixTimestamp()
Returns the timestamp associated with a mix name;
- getCrawlParametersFromSeedInfo
- in file admin_controller.php, method AdminController::getCrawlParametersFromSeedInfo()
Reads the parameters for a crawl from an array gotten from a crawl.ini file
- getCrawlSeedInfo
- in file crawl_controller.php, method CrawlController::getCrawlSeedInfo()
Handles a request for the starting parameters of a crawl of a given
- getCrawlSeedInfo
- in file crawl_model.php, method CrawlModel::getCrawlSeedInfo()
Returns the crawl parameters that were used during a given crawl
- getCrawlTimes
- in file fetch_controller.php, method FetchController::getCrawlTimes()
Gets a list of all the timestamps of previously stored crawls
- getCronTime
- in file cron_model.php, method CronModel::getCronTime()
Returns the timestamp of last time cron run. Not using db as sqlite seemed to have locking issues if the transaction takes a while
- getCurlIp
- in file fetch_url.php, method FetchUrl::getCurlIp()
Computes the IP address from http get-responser header
- getCurrentDocsForKeys
- in file index_bundle_iterator.php, method IndexBundleIterator::getCurrentDocsForKeys()
Gets the summaries associated with the keys provided the keys
- getCurrentDocsForKeys
- in file network_iterator.php, method NetworkIterator::getCurrentDocsForKeys()
Gets the summaries associated with the keys provided the keys
- getCurrentDocsForKeys
- in file union_iterator.php, method UnionIterator::getCurrentDocsForKeys()
Gets the summaries associated with the keys provided the keys
- getCurrentIndexDatabaseName
- in file crawl_model.php, method CrawlModel::getCurrentIndexDatabaseName()
Gets the name (aka timestamp) of the current index archive to be used to handle search queries
- getCurrentShard
- in file index_archive_bundle.php, method IndexArchiveBundle::getCurrentShard()
Returns the shard which is currently being used to read word-document data from the bundle. If one wants to write data to the bundle use getActiveShard() instead. The point of this method is to allow for lazy reading of the file associated with the shard.
- getDataArchiveFileData
- in file queue_server.php, method QueueServer::getDataArchiveFileData()
Used to get a data archive file (either during a normal crawl or a recrawl). After uncompressing this file (which comes via the web server through fetch_controller, from the fetcher), it sets which fetcher sent it and also returns the sites contained in it.
- getDbmsList
- in file model.php, method Model::getDbmsList()
Gets a list of all DBMS that work with the search engine
- getDeltaFileInfo
- in file crawl_model.php, method CrawlModel::getDeltaFileInfo()
Returns all the files in $dir or its subdirectories with modfied times more recent than timestamp. The file which have in their path or name a string in the $excludes array will be exclude
- getDictSubstring
- in file index_dictionary.php, method IndexDictionary::getDictSubstring()
Gets from disk $len many bytes beginning at $offset from the $file_num prefix file in the index dictionary
- getDocIndexOfPostingAtOffset
- in file index_shard.php, method IndexShard::getDocIndexOfPostingAtOffset()
Returns the document index of the posting at offset $current in
- getDocInfoSubstring
- in file index_shard.php, method IndexShard::getDocInfoSubstring()
From disk gets $len many bytes starting from $offset in the doc_infos strings
- getDocumentFilename
- in file url_parser.php, method UrlParser::getDocumentFilename()
Gets the filename portion of a url if present; otherwise returns "Some File"
- getDocumentType
- in file url_parser.php, method UrlParser::getDocumentType()
Given a url, makes a guess at the file type of the file it points to
- getEarliestSlot
- in file queue_server.php, method QueueServer::getEarliestSlot()
Gets the first unfilled schedule slot after $index in $arr
- getEditedPageSummaries
- in file searchfilters_model.php, method SearchfiltersModel::getEditedPageSummaries()
Reads in and returns data on result pages whose summaries should be altered to something other than whats in the current index.
- getEndVertex
- in file recipe_plugin.php, method Edge::getEndVertex()
- getEntry
- in file hash_table.php, method HashTable::getEntry()
Get the ith entry of the array for the hash table (no hashing here)
- getFetchSites
- in file fetcher.php, method Fetcher::getFetchSites()
Prepare an array of up to NUM_MULTI_CURL_PAGES' worth of sites to be downloaded in one go using the to_crawl array. Delete these sites from the to_crawl array.
- getFileBlock
- in file text_archive_bundle_iterator.php, method TextArchiveBundleIterator::getFileBlock()
Reads and return the block of data from the current partition
- getFilter
- in file searchfilters_model.php, method SearchfiltersModel::getFilter()
Gets a list of hashes of hostnames to be filtered from search results.
- getFragment
- in file url_parser.php, method UrlParser::getFragment()
Get the url fragment string component of a url
- getHashBitPositionArray
- in file bloom_filter_file.php, method BloomFilterFile::getHashBitPositionArray()
Hashes $value to a bit position in the BloomFilter
- getHost
- in file url_parser.php, method UrlParser::getHost()
Get the host name portion of a url if present; if not return false
- getHostPaths
- in file url_parser.php, method UrlParser::getHostPaths()
Gets an array of prefix urls from a given url. Each prefix contains at least the the hostname of the the start url
- getHostSubdomains
- in file url_parser.php, method UrlParser::getHostSubdomains()
Gets the subdomains of the host portion of a url. So
- getIndex
- in file index_manager.php, method IndexManager::getIndex()
Returns a reference to the managed copy of an IndexArchiveBundle object with a given timestamp or an IndexShard in the case where $index_name == "feed" (for handling news feeds)
- getIndexTimestamp
- in file search_controller.php, method SearchController::getIndexTimestamp()
Finds the timestamp of the main crawl or mix to return results from Does not do checking to make sure timestamp exists.
- getInfoTimestamp
- in file crawl_controller.php, method CrawlController::getInfoTimestamp()
Handles a request for information about a crawl with a given timestamp
- getInfoTimestamp
- in file crawl_model.php, method CrawlModel::getInfoTimestamp()
Get a description associated with a Web Crawl or Crawl Mix
- getIngredientName
- in file recipe_plugin.php, method RecipePlugin::getIngredientName()
Extracts the main ingredient from the ingredient.
- getLabel
- in file recipe_plugin.php, method Vertex::getLabel()
- getLang
- in file url_parser.php, method UrlParser::getLang()
Attempts to guess the language tag based on url
- getLocaleDirection
- in file locale_model.php, method LocaleModel::getLocaleDirection()
The text direction of the current locale being used by the text engine
- getLocaleDirection
- in file locale_functions.php, function getLocaleDirection()
Returns the current language directions.
- getLocaleList
- in file locale_model.php, method LocaleModel::getLocaleList()
Returns information about all available locales
- getLocaleQueryStatistics
- in file locale_functions.php, function getLocaleQueryStatistics()
Returns the query statistics info for the current llocalt.
- getLocaleTag
- in file locale_model.php, method LocaleModel::getLocaleTag()
Get the current IANA language tag being used by the search engine
- getLocaleTag
- in file query_tool.php, function getLocaleTag()
Used within PhraseModel called from SearchController to do stemming
- getLockFileName
- in file crawl_daemon.php, method CrawlDaemon::getLockFileName()
Used to return the string name of the lock file used to pass by a daemon
- getLog
- in file machine_model.php, method MachineModel::getLog()
Get either a fetcher or queue_server log for a machine
- getMachineList
- in file machine_model.php, method MachineModel::getMachineList()
Returns all the machine names stored in the DB
- getMachineStatuses
- in file machine_model.php, method MachineModel::getMachineStatuses()
Returns the statuses of machines in the machine table of their fetchers and queue_server as well as the name and url's of these machines
- getMachinesTimestamp
- in file parallel_model.php, method ParallelModel::getMachinesTimestamp()
- getMediaSources
- in file source_model.php, method SourceModel::getMediaSources()
Returns a list of media sources such as (video, rss sites) and their URL and thumb url formats, etc
- getMesssageFileName
- in file crawl_daemon.php, method CrawlDaemon::getMesssageFileName()
Used to return the string name of the messages file used to pass messages to a daemon running in the background
- getMetaRobots
- in file html_processor.php, method HtmlProcessor::getMetaRobots()
Get any NOINDEX, NOFOLLOW, NOARCHIVE, NONE, info out of any robot meta tags.
- getMixList
- in file crawl_model.php, method CrawlModel::getMixList()
Gets a list of all mixes of available crawls
- getNameString
- in file crawl_daemon.php, method CrawlDaemon::getNameString()
Used to return a string name for a given daemon instance
- getNextObject
- in file rtf_processor.php, method RtfProcessor::getNextObject()
Gets the contents of the rtf group at the current position in the string
- getNextObject
- in file pdf_processor.php, method PdfProcessor::getNextObject()
Gets between an obj and endobj tag at the current position in a PDF document
- getNextTagData
- in file text_archive_bundle_iterator.php, method TextArchiveBundleIterator::getNextTagData()
Used to extract data between two tags. After operation $this->buffer has contents after the close tag.
- getNextTagsData
- in file text_archive_bundle_iterator.php, method TextArchiveBundleIterator::getNextTagsData()
Used to extract data between two tags for the first tag found amongst the array of tags $tags. After operation $this->buffer has contents after the close tag.
- getNextVertex
- in file recipe_plugin.php, method Tree::getNextVertex()
gets the next vertex from the adjacency matrix for a given vertex
- getNumDocsArray
- in file index_dictionary.php, method IndexDictionary::getNumDocsArray()
Given an array of $key => $word_id associations returns an array of $key => $num_docs of that $word_id
- getObjectDictionary
- in file pdf_processor.php, method PdfProcessor::getObjectDictionary()
Gets the object dictionary portion of the current PDF object
- getObjects
- in file web_archive.php, method WebArchive::getObjects()
Gets $num many objects out of the web archive starting at byte $offset
- getObjectStream
- in file pdf_processor.php, method PdfProcessor::getObjectStream()
Gets the object stream portion of the current PDF object
- getObjectTestCase
- in file web_archive_test.php, method WebArchiveTest::getObjectTestCase()
Does two addObjects of three objects each. Then does a getObjects to get
- getPage
- in file fetch_url.php, method FetchUrl::getPage()
Make a curl request for the provide url
- getPage
- in file web_archive_bundle.php, method WebArchiveBundle::getPage()
Gets a page using in WebArchive $partition using the provided byte $offset and using existing $file_handle if possible.
- getPage
- in file index_archive_bundle.php, method IndexArchiveBundle::getPage()
Gets the page out of the summaries WebArchiveBundle with the given offset and generation
- getPages
- in file fetch_url.php, method FetchUrl::getPages()
Make multi_curl requests for an array of sites with urls
- getPagesToGroup
- in file group_iterator.php, method GroupIterator::getPagesToGroup()
Gets a sample of a few hundred pages on which to do grouping by URL
- getParamModifiedTime
- in file web_archive_bundle.php, method WebArchiveBundle::getParamModifiedTime()
Returns the mast time the archive info of the bundle was modified.
- getParamModifiedTime
- in file index_archive_bundle.php, method IndexArchiveBundle::getParamModifiedTime()
Returns the mast time the archive info of the bundle was modified.
- getPartition
- in file web_archive_bundle.php, method WebArchiveBundle::getPartition()
Gets an object encapsulating the $index the WebArchive partition in this bundle.
- getPath
- in file url_parser.php, method UrlParser::getPath()
Get the path portion of a url if present; if not return NULL
- getPhrasePageResults
- in file phrase_model.php, method PhraseModel::getPhrasePageResults()
Given a query phrase, returns formatted document summaries of the documents that match the phrase.
- getPostingAtOffset
- in file index_shard.php, method IndexShard::getPostingAtOffset()
Gets the posting closest to index $current in the word_docs string
- getPostingsSlice
- in file index_shard.php, method IndexShard::getPostingsSlice()
Returns documents using the word_docs string (either as stored
- getPostingsSliceById
- in file index_shard.php, method IndexShard::getPostingsSliceById()
Returns $len many documents which contained the word corresponding to $word_id (only works for loaded shards)
- getProcessors
- in file recipe_plugin.php, method RecipePlugin::getProcessors()
Which mime type page processors this plugin should do additional processing for
- getProcessors
- in file indexing_plugin.php, method IndexingPlugin::getProcessors()
- getProfile
- in file profile_model.php, method ProfileModel::getProfile()
Reads a profile from a profile.php file in the provided directory
- getQuery
- in file url_parser.php, method UrlParser::getQuery()
Get the query string component of a url
- getQueryIterator
- in file phrase_model.php, method PhraseModel::getQueryIterator()
Using the supplied $word_structs, contructs an iterator for getting results to a query
- getQueueServerUrls
- in file machine_model.php, method MachineModel::getQueueServerUrls()
Returns urls for all the queue_servers stored in the DB
- getRecordStart
- in file warc_archive_bundle_iterator.php, method WarcArchiveBundleIterator::getRecordStart()
- getRegions
- in file tokenizer.php, method ItStemmer::getRegions()
Computes regions R1, R2 and RV in the form
- getRobotTxtAge
- in file web_queue_bundle.php, method WebQueueBundle::getRobotTxtAge()
Gets the timestamp of the oldest robot data still stored in
- getRoleActivities
- in file role_model.php, method RoleModel::getRoleActivities()
Get the activities (name, method, id) that a given role can perform
- getRoleId
- in file role_model.php, method RoleModel::getRoleId()
Get role id associated with rolename (so rolenames better be unique)
- getRoleList
- in file role_model.php, method RoleModel::getRoleList()
Get a list of all roles. Role names are not localized since these are created by end user admins of the search engine
- getRow
- in file priority_queue.php, method PriorityQueue::getRow()
Gets the ith element of the PriorityQueue viewed as an array
- getSeedInfo
- in file crawl_model.php, method CrawlModel::getSeedInfo()
Returns the initial sites that a new crawl will start with along with
- getShardHeader
- in file index_shard.php, method IndexShard::getShardHeader()
If not already loaded, reads in from disk the fixed-length'd field
- getShardInfo
- in file doc_iterator.php, method DocIterator::getShardInfo()
- getShardSubstring
- in file index_shard.php, method IndexShard::getShardSubstring()
Gets from Disk Data $len many bytes beginning at $offset from the current IndexShard
- getShardWord
- in file index_shard.php, method IndexShard::getShardWord()
Reads 32 bit word as an unsigned int from the offset given in the shard
- getSnippets
- in file model.php, method Model::getSnippets()
Given a string, extracts a snippets of text related to a given set of key words. For a given word a snippet is a window of characters to its left and right that is less than a maximum total number of characters.
- getStartVertex
- in file recipe_plugin.php, method Edge::getStartVertex()
- getStaticPage
- in file locale_model.php, method LocaleModel::getStaticPage()
Returns the static page for with the given name translated to the given locale_tag
- getStaticPageList
- in file locale_model.php, method LocaleModel::getStaticPageList()
Returns a list of the static pages thaat can be localized
- getStemmer
- in file phrase_parser.php, method PhraseParser::getStemmer()
Loads and instantiates a stemmer object for a language if exists
- getStringData
- in file locale_model.php, method LocaleModel::getStringData()
For each translatable identifier string (either static from a configure ini file, or dynamic from the db) return its name together with its translation into the given locale if such a translation exists.
- getSubsearches
- in file source_model.php, method SourceModel::getSubsearches()
Returns a list of the subsearches used by the current Yioop instances including their names translated to the current locale
- getSummariesByHash
- in file phrase_model.php, method PhraseModel::getSummariesByHash()
Gets doc summaries of documents containing given words and meeting the
- getSummariesFromOffsets
- in file phrase_model.php, method PhraseModel::getSummariesFromOffsets()
Used to lookup summary info for the pages provided (using their) self::SUMMARY_OFFSET field. If any of the lookupped summaries are location's then looks these up in turn. This method handles robot meta tags which might forbid indexing.
- getTestNames
- in file index.php, function getTestNames()
Gets the names of all the unit test files in the current directory.
- getText
- in file pdf_processor.php, method PdfProcessor::getText()
Gets the text out of a PDF document
- getText
- in file rtf_processor.php, method RtfProcessor::getText()
Gets plain text out of an rtf string
- getTextContent
- in file mediawiki_bundle_iterator.php, method MediaWikiArchiveBundleIterator::getTextContent()
Gets the text content of the first dom node satisfying the xpath expression $path in the dom document $dom
- getTextContent
- in file odp_rdf_bundle_iterator.php, method OdpRdfArchiveBundleIterator::getTextContent()
Gets the text content of the first dom node satisfying the xpath expression $path in the dom document $dom
- getTopPhrases
- in file search_controller.php, method SearchController::getTopPhrases()
Given a page summary extract the words from it and try to find documents
- getTranslateStrings
- in file locale_model.php, method LocaleModel::getTranslateStrings()
Searches the directories provided looking for files matching the extensions provided. When such a file is found it is loaded and scanned for tl() function calls. The identifier string in this function call is then extracted and added to a line array of strings to be translated.
- getUrlFilterAge
- in file web_queue_bundle.php, method WebQueueBundle::getUrlFilterAge()
Gets the timestamp of the oldest url filter data still stored in
- getUrls
- in file searchfilters_model.php, method SearchfiltersModel::getUrls()
Gets a list of hostnames to be filtered from search results.
- getUserActivities
- in file user_model.php, method UserModel::getUserActivities()
Get a list of admin activities that a user is allowed to perform.
- getUserId
- in file signin_model.php, method SigninModel::getUserId()
Get the user_id associated with a given username
- getUserList
- in file user_model.php, method UserModel::getUserList()
Returns an array of all user_names
- getUserName
- in file signin_model.php, method SigninModel::getUserName()
Get the user_name associated with a given userid
- getUserRoles
- in file user_model.php, method UserModel::getUserRoles()
Gets all the roles associated with a user id
- getUserSession
- in file user_model.php, method UserModel::getUserSession()
Returns $_SESSION variable of given user from the last time logged in.
- getValues
- in file trie.php, method Trie::getValues()
Returns all the terms in the trie beneath the provided term prefix
- getValuesTestCase
- in file trie_test.php, method TrieTest::getValuesTestCase()
Check that if we can get all the terms from a trie that begin
- getVarField
- in file page_rule_parser.php, method PageRuleParser::getVarField()
Either returns $var_name or the value of the CrawlConstant with name $var_name.
- getWarcHeaders
- in file warc_archive_bundle_iterator.php, method WarcArchiveBundleIterator::getWarcHeaders()
Used to parse the header portion of a WARC record
- getWordDocsSubstring
- in file index_shard.php, method IndexShard::getWordDocsSubstring()
From disk gets $len many bytes starting from $offset in the word_docs strings
- getWordDocsWord
- in file index_shard.php, method IndexShard::getWordDocsWord()
Reads 32 bit word as an unsigned int from the offset given in the
- getWordInfo
- in file index_shard.php, method IndexShard::getWordInfo()
Returns the first offset, last offset, and number of documents the word occurred in for this shard. The first offset (similarly, the last offset) is the byte offset into the word_docs string of the first (last) record involving that word.
- getWordInfo
- in file index_dictionary.php, method IndexDictionary::getWordInfo()
For each index shard generation a word occurred in, return as part of
- getWordInfoFromString
- in file index_shard.php, method IndexShard::getWordInfoFromString()
Converts $str into 3 ints for a first offset into word_docs, a last offset into word_docs, and a count of number of docs with that word.
- getWordInfoTier
- in file index_dictionary.php, method IndexDictionary::getWordInfoTier()
This method facilitates query processing of an ongoing crawl.
- getWordsIfHostUrl
- in file url_parser.php, method UrlParser::getWordsIfHostUrl()
Given a url, extracts the words in the host part of the url provided the url does not have a path part more than / .
- getWordsLastPathPartUrl
- in file url_parser.php, method UrlParser::getWordsLastPathPartUrl()
Given a url, extracts the words in the last path part of the url
- getWritingMode
- in file locale_functions.php, function getWritingMode()
Returns the writing mode of the current locale. This is a combination of the locale direction and the block progression. For instance, for English the writing mode is lr-tb (left-to-right top-to-bottom).
- getWritingMode
- in file locale_model.php, method LocaleModel::getWritingMode()
Get the writing mode of the current locale (text and block directions)
- GifProcessor
- in file gif_processor.php, class GifProcessor
Used to create crawl summary information for GIF files
- GOT_ROBOT_TXT
- in file crawl_constants.php, class constant CrawlConstants::GOT_ROBOT_TXT
- greaterThan
- in file utility.php, function greaterThan()
Callback to check if $a is greater than $b
- groupByHashAndAggregate
- in file group_iterator.php, method GroupIterator::groupByHashAndAggregate()
For documents which had been previously grouped by the hash of their url, groups these groups further by the hash of their pages contents.
- groupByHashUrl
- in file group_iterator.php, method GroupIterator::groupByHashUrl()
Groups documents as well as mini-pages based on links to documents by
- GroupIterator
- in file group_iterator.php, class GroupIterator
This iterator is used to group together documents or document parts
- guessLangEncoding
- in file locale_functions.php, function guessLangEncoding()
Tries to guess at a language tag based on the name of a character encoding
- guessLocale
- in file locale_functions.php, function guessLocale()
Attempts to guess the user's locale based on the request, session, and user-agent data
- guessLocaleFromString
- in file locale_functions.php, function guessLocaleFromString()
Attempts to guess the user's locale based on a string sample
- guessSemantics
- in file phrase_model.php, method PhraseModel::guessSemantics()
Idealistically, this function tries to guess from the query what the user is looking for. For now, we are just doing simple things like when a query term is a url and rewriting it to the appropriate meta meta word.
- GzipCompressor
- in file gzip_compressor.php, class GzipCompressor
Implementation of a Compressor using GZIP/GUNZIP as the filter.
- gzip_compressor.php
- procedural page gzip_compressor.php
- group_iterator.php
- procedural page group_iterator.php
- gif_processor.php
- procedural page gif_processor.php
top
h
- $hash_rebuild_count
- in file web_queue_bundle.php, variable WebQueueBundle::$hash_rebuild_count
Current count of the number of non-read operation performed on the WebQueueBundles's hash table since the last time it was rebuilt.
- $header
- in file odp_rdf_bundle_iterator.php, variable OdpRdfArchiveBundleIterator::$header
Associative array containing global properties like base url of the
- $header_info
- in file bzip2_block_iterator.php, variable BZip2BlockIterator::$header_info
Lookup table fpr the number of bits by which the magic
- $helpers
- in file admin_view.php, variable AdminView::$helpers
Names of helper objects that the view uses to help draw itself
- $helpers
- in file machinestatus_view.php, variable MachinestatusView::$helpers
Names of helper objects that the view uses to help draw itself
- $helpers
- in file view.php, variable View::$helpers
Names of helper objects that the view uses to help draw itself
- $helpers
- in file rss_view.php, variable RssView::$helpers
Names of helper objects that the view uses to help draw itself
- $helpers
- in file statistics_view.php, variable StatisticsView::$helpers
Names of helper objects that the view uses to help draw itself
- $helpers
- in file search_view.php, variable SearchView::$helpers
Names of helper objects that the view uses to help draw itself
- $helpers
- in file settings_view.php, variable SettingsView::$helpers
Names of helper objects that the view uses to help draw itself
- $hosts_with_errors
- in file fetcher.php, variable Fetcher::$hosts_with_errors
An array to keep track of hosts which have had a lot of http errors
- $hourly_crawl_data
- in file queue_server.php, variable QueueServer::$hourly_crawl_data
This is a list of hourly (timestamp, number_of_urls_crawled) data
- HALF_BLANK
- in file index_shard.php, class constant IndexShard::HALF_BLANK
Flag used to indicate that a word item should not be packed or unpacked
- handle
- in file page_processor.php, method PageProcessor::handle()
Method used to handle processing data for a web page. It makes
- handleAdminMessages
- in file queue_server.php, method QueueServer::handleAdminMessages()
Handles messages passed via files to the QueueServer.
- handleUploadedData
- in file fetch_controller.php, method FetchController::handleUploadedData()
After robot, schedule, and index data have been uploaded and reassembled
- HASH
- in file crawl_constants.php, class constant CrawlConstants::HASH
- hash
- in file hash_table.php, method HashTable::hash()
Hashes the provided key to an index in the array of the hash table
- hasHostUrl
- in file url_parser.php, method UrlParser::hasHostUrl()
Checks if the url has a host part.
- HashTable
- in file hash_table.php, class HashTable
Code used to manage a memory efficient hash table Weights for the queue must be flaots
- HashTableTest
- in file hash_table_test.php, class HashTableTest
Used to test that the HashTable class properly stores key value pairs, handles insert, deletes, collisions okay. It should also detect when table is full
- HASH_KEY_SIZE
- in file web_queue_bundle.php, class constant WebQueueBundle::HASH_KEY_SIZE
Number of bytes in for hash table key
- HASH_SEEN_URLS
- in file crawl_constants.php, class constant CrawlConstants::HASH_SEEN_URLS
- HASH_SUM_SCORE
- in file crawl_constants.php, class constant CrawlConstants::HASH_SUM_SCORE
- HASH_URL
- in file crawl_constants.php, class constant CrawlConstants::HASH_URL
- HASH_URL_COUNT
- in file crawl_constants.php, class constant CrawlConstants::HASH_URL_COUNT
- HASH_VALUE_SIZE
- in file web_queue_bundle.php, class constant WebQueueBundle::HASH_VALUE_SIZE
4 bytes offset, 4 bytes index, 4 bytes flags
- HEADER
- in file crawl_constants.php, class constant CrawlConstants::HEADER
- headerToShardFields
- in file index_shard.php, method IndexShard::headerToShardFields()
Split a header string into a shards field variable
- HEADER_LENGTH
- in file index_shard.php, class constant IndexShard::HEADER_LENGTH
Header Length of an IndexShard (sum of its non-variable length fields)
- Helper
- in file helper.php, class Helper
Base Helper Class.
- historyUI
- in file search_controller.php, method SearchController::historyUI()
User Interface for history feature
- HOST_KEY_POS
- in file word_iterator.php, class constant WordIterator::HOST_KEY_POS
Host Key position + 1 (first char says doc, inlink or eternal link)
- HOST_KEY_POS
- in file network_iterator.php, class constant NetworkIterator::HOST_KEY_POS
Host Key position + 1 (first char says doc, inlink or eternal link)
- HOST_KEY_POS
- in file doc_iterator.php, class constant DocIterator::HOST_KEY_POS
Host Key position + 1 (first char says doc, inlink or eternal link)
- HtmlProcessor
- in file html_processor.php, class HtmlProcessor
Used to create crawl summary information for HTML files
- HTTP_CODE
- in file crawl_constants.php, class constant CrawlConstants::HTTP_CODE
- hash_table.php
- procedural page hash_table.php
- html_processor.php
- procedural page html_processor.php
- hash_table_test.php
- procedural page hash_table_test.php
- helper.php
- procedural page helper.php
top
i
- $indexed_file_types
- in file queue_server.php, variable QueueServer::$indexed_file_types
List of file extensions supported for the crawl
- $indexed_file_types
- in file fetcher.php, variable Fetcher::$indexed_file_types
List of file extensions supported for the crawl
- $indexing_plugins
- in file page_processor.php, variable PageProcessor::$indexing_plugins
indexing_plugins which might be used with the current processor
- $indexing_plugins
- in file queue_server.php, variable QueueServer::$indexing_plugins
This is a list of indexing_plugins which might do post processing after the crawl. The plugins postProcessing function is called if it is selected in the crawl options page.
- $indexing_plugins
- in file controller.php, variable Controller::$indexing_plugins
Says which post processing indexing plugins are available
- $index_archive
- in file indexing_plugin.php, variable IndexingPlugin::$index_archive
The IndexArchiveBundle object that this indexing plugin might
- $index_archive
- in file queue_server.php, variable QueueServer::$index_archive
Holds the IndexArchiveBundle for the current crawl. This encapsulates the inverted index word-->documents for the crawls as well as document summaries of each document.
- $index_bundle_iterator
- in file group_iterator.php, variable GroupIterator::$index_bundle_iterator
The iterator we are using to get documents from
- $index_bundle_iterators
- in file intersect_iterator.php, variable IntersectIterator::$index_bundle_iterators
An array of iterators whose intersection we get documents from
- $index_bundle_iterators
- in file negation_iterator.php, variable NegationIterator::$index_bundle_iterators
An array of iterators whose interection we get documents from
- $index_bundle_iterators
- in file union_iterator.php, variable UnionIterator::$index_bundle_iterators
An array of iterators whose interection we get documents from
- $index_dirty
- in file queue_server.php, variable QueueServer::$index_dirty
flags for whether the index has data to be written to disk
- $index_name
- in file word_iterator.php, variable WordIterator::$index_name
The timestamp of the index is associated with this iterator
- $index_name
- in file parallel_model.php, variable ParallelModel::$index_name
Stores the name of the current index archive to use to get search
- $index_name
- in file doc_iterator.php, variable DocIterator::$index_name
The timestamp of the index is associated with this iterator
- $index_time_stamp
- in file statistics_controller.php, variable StatisticsController::$index_time_stamp
Timestamp of crawl statistics are being generated for
- $ini
- in file text_archive_bundle_iterator.php, variable TextArchiveBundleIterator::$ini
Contains basic parameters of how this iterate works: compression, start and stop delimiter. Typically, this data is read from the arc_description.ini file
- $is_string
- in file web_archive.php, variable WebArchive::$is_string
Says whether the archive is a string archive
- $iterate_dir
- in file database_bundle_iterator.php, variable DatabaseBundleIterator::$iterate_dir
The path to the directory containing the archive partitions to be iterated over.
- $iterate_dir
- in file text_archive_bundle_iterator.php, variable TextArchiveBundleIterator::$iterate_dir
The path to the directory containing the archive partitions to be iterated over.
- $iterate_timestamp
- in file archive_bundle_iterator.php, variable ArchiveBundleIterator::$iterate_timestamp
Timestamp of the archive that is being iterated over
- $iterator_pos
- in file web_archive.php, variable WebArchive::$iterator_pos
Current offset into the web archive the iterator for the archive is at
- imageCachePage
- in file search_controller.php, method SearchController::imageCachePage()
Makes an HTML web page for an image cache item
- imagecreatefrombmp
- in file bmp_processor.php, method BmpProcessor::imagecreatefrombmp()
Reads in a 32 / 24bit non-palette bmp files from provided filename and returns a php image object corresponding to it. This is a crude variation of code from imagecreatewbmp function documentation at php.net
- ImageProcessor
- in file image_processor.php, class ImageProcessor
Base abstract class common to all processors used to create crawl summary information from images
- ImagesHelper
- in file images_helper.php, class ImagesHelper
Helper used to draw thumbnails strips for images
- in
- in file tokenizer.php, method ItStemmer::in()
Checks if a string occurs in another string
- INDEX
- in file crawl_constants.php, class constant CrawlConstants::INDEX
- index.php
- procedural page index.php
- IndexArchiveBundle
- in file index_archive_bundle.php, class IndexArchiveBundle
Encapsulates a set of web page summaries and an inverted word-index of terms from these summaries which allow one to search for summaries containing a particular word.
- IndexBundleIterator
- in file index_bundle_iterator.php, class IndexBundleIterator
Abstract classed used to model iterating documents indexed in an IndexArchiveBundle or set of such bundles.
- IndexDictionary
- in file index_dictionary.php, class IndexDictionary
Data structure used to store for entries of the form:
- INDEXED_FILE_TYPES
- in file crawl_constants.php, class constant CrawlConstants::INDEXED_FILE_TYPES
- INDEXER
- in file crawl_constants.php, class constant CrawlConstants::INDEXER
Used to say what kind of queue_server this is
- indexExists
- in file phrase_model.php, method PhraseModel::indexExists()
Returns whether there is a index with the provide timestamp
- IndexingPlugin
- in file indexing_plugin.php, class IndexingPlugin
Base indexing plugin Class. An indexing plugin allows a developer
- INDEXING_PLUGINS
- in file crawl_constants.php, class constant CrawlConstants::INDEXING_PLUGINS
- IndexManager
- in file index_manager.php, class IndexManager
Class used to manage open IndexArchiveBundle's while performing a query. Ensures an easy place to obtain references to these bundles and ensures only one object per bundle is instantiated in a Singleton-esque way.
- indexSave
- in file queue_server.php, method QueueServer::indexSave()
Saves the index_archive and, in particular, its current shard to disk
- IndexShard
- in file index_shard.php, class IndexShard
Data structure used to store one generation worth of the word document index (inverted index).
- IndexShardTest
- in file index_shard_test.php, class IndexShardTest
Used to test that the StringArray class properly stores/retrieves values, and can handle loading and saving
- index_closed_name
- in file crawl_constants.php, class constant CrawlConstants::index_closed_name
- index_data_base_name
- in file crawl_constants.php, class constant CrawlConstants::index_data_base_name
- INI
- in file crawl_constants.php, class constant CrawlConstants::INI
- init
- in file crawl_daemon.php, method CrawlDaemon::init()
Used to send a message the given daemon or run the program in the foreground.
- initCountIfNotExists
- in file web_archive_bundle.php, method WebArchiveBundle::initCountIfNotExists()
Creates a new counter to be maintained in the description.txt file if the counter doesn't exist, leaves unchanged otherwise
- initGenerationToAdd
- in file index_archive_bundle.php, method IndexArchiveBundle::initGenerationToAdd()
Determines based on its size, if index_shard should be added to the active generation or in a new generation should be started.
- initialize
- in file locale_model.php, method LocaleModel::initialize()
Loads the provided locale's configure file (containing transalation) and calls setlocale to set up locale specific string formatting (for to format numbers, etc.)
- initializeIndexInfo
- in file search_controller.php, method SearchController::initializeIndexInfo()
Determines which crawl or mix timestamp should be in use for this query. It also determines info and returns associated with this timestamp.
- initializeResponseFormat
- in file search_controller.php, method SearchController::initializeResponseFormat()
Determines how this query is being run and return variables for the view
- initializeSubsearches
- in file search_controller.php, method SearchController::initializeSubsearches()
Determines if query results are using a subsearch, and if so initializes them, also it sets up list of subsearches to draw at top of screen.
- initializeSubstitutions
- in file mediawiki_bundle_iterator.php, method MediaWikiArchiveBundleIterator::initializeSubstitutions()
Used to initialize the arrays of match/replacements used to format
- initializeUserAndDefaultActivity
- in file search_controller.php, method SearchController::initializeUserAndDefaultActivity()
Determines the kind of user session that this search request is for
- initializeWebQueue
- in file queue_server.php, method QueueServer::initializeWebQueue()
This method sets up a WebQueueBundle according to the current crawl order so that it can receive urls and prioritize them.
- injectUrlsCurrentCrawl
- in file crawl_controller.php, method CrawlController::injectUrlsCurrentCrawl()
Receives a request to inject new urls into the active
- injectUrlsCurrentCrawl
- in file crawl_model.php, method CrawlModel::injectUrlsCurrentCrawl()
Add the provided urls to the schedule directory of URLs that will be crawled
- INLINKS
- in file crawl_constants.php, class constant CrawlConstants::INLINKS
- insert
- in file priority_queue.php, method PriorityQueue::insert()
Inserts a new item into the priority queue.
- insert
- in file hash_table.php, method HashTable::insert()
Inserts the provided $key - $value pair into the hash table
- insertDeleteLookupTestCase
- in file hash_table_test.php, method HashTableTest::insertDeleteLookupTestCase()
Checks insert an item, delete that item, then look it up. Make sure we don't find it after deletion.
- insertHashTable
- in file web_queue_bundle.php, method WebQueueBundle::insertHashTable()
Inserts the $key, $value pair into this web queue's to crawl table
- insertID
- in file pdo_manager.php, method PdoManager::insertID()
- insertID
- in file sqlite_manager.php, method SqliteManager::insertID()
- insertID
- in file datasource_manager.php, method DatasourceManager::insertID()
Returns the ID generated by the last insert statement if table has an auto increment key column
- insertID
- in file sqlite3_manager.php, method Sqlite3Manager::insertID()
- insertID
- in file mysql_manager.php, method MysqlManager::insertID()
- insertLookupTestCase
- in file hash_table_test.php, method HashTableTest::insertLookupTestCase()
Check if for the big hash table we insert something then later look it
- insertReferences
- in file mediawiki_bundle_iterator.php, method MediaWikiArchiveBundleIterator::insertReferences()
After regex processing has been done on a wiki page this function inserts into the resulting page a reference at {{reflist locations, then returns the result page
- insertTableOfContents
- in file mediawiki_bundle_iterator.php, method MediaWikiArchiveBundleIterator::insertTableOfContents()
After regex processing has been done on a wiki page this function inserts into the resulting page a table of contents just before the first h2 tag, then returns the result page
- instantiateIterator
- in file arc_tool.php, method ArcTool::instantiateIterator()
Used to create an archive_bundle_iterator for a non-yioop archive As these iterators sometimes make use of a folder to store savepoints We create a temporary folder for this purpose in the current directory This should be garbage collected elsewhere.
- IntersectIterator
- in file intersect_iterator.php, class IntersectIterator
Used to iterate over the documents which occur in all of a set of iterator results
- inTestCase
- in file bloom_filter_file_test.php, method BloomFilterFileTest::inTestCase()
Tests if we insert something into the bloom filter, that when we look it
- INT_SIZE
- in file web_queue_bundle.php, class constant WebQueueBundle::INT_SIZE
Size of int
- INVERTED_INDEX
- in file crawl_constants.php, class constant CrawlConstants::INVERTED_INDEX
- IN_LINK
- in file config.php, constant IN_LINK
- IP_ADDRESSES
- in file crawl_constants.php, class constant CrawlConstants::IP_ADDRESSES
- IP_LINK
- in file config.php, constant IP_LINK
- IP_SIZE
- in file web_queue_bundle.php, class constant WebQueueBundle::IP_SIZE
Length of an IPv6 ip address (IPv4 address are padded)
- isAIndexer
- in file queue_server.php, method QueueServer::isAIndexer()
Used to check if the current queue_server process is acting a indexer of data coming from fetchers
- isAScheduler
- in file queue_server.php, method QueueServer::isAScheduler()
Used to check if the current queue_server process is acting a url scheduler for fetchers
- isCrawlMix
- in file crawl_model.php, method CrawlModel::isCrawlMix()
Returns whether the supplied timestamp corresponds to a crawl mix
- isEmpty
- in file recipe_plugin.php, method Queue::isEmpty()
- isLocalhostUrl
- in file url_parser.php, method UrlParser::isLocalhostUrl()
Checks if a $url is on localhost
- isOnlyIndexer
- in file queue_server.php, method QueueServer::isOnlyIndexer()
Used to check if the current queue_server process is acting only as a indexer of data coming from fetchers (and not some other activity like scheduler as well)
- isOnlyScheduler
- in file queue_server.php, method QueueServer::isOnlyScheduler()
Used to check if the current queue_server process is acting only as a indexer of data coming from fetchers (and not some other activity like indexer as well)
- isPathMemberRegexPaths
- in file url_parser.php, method UrlParser::isPathMemberRegexPaths()
Checks if $path matches against any of the Robots.txt style regex paths in $paths
- isPathMemberRegexPathsTestCase
- in file url_parser_test.php, method UrlParserTest::isPathMemberRegexPathsTestCase()
Check is a path matches with a list of paths presumably coming from
- isSchemeHttpOrHttps
- in file url_parser.php, method UrlParser::isSchemeHttpOrHttps()
Checks if the url scheme is either http or https.
- isSingleLocalhost
- in file model.php, method Model::isSingleLocalhost()
Used to determine if an action involves just one yioop instance on the current local machine or not
- isTranslated
- in file locale_model.php, method LocaleModel::isTranslated()
Checks if the given string_id has a translation in translations
- isVideoUrl
- in file url_parser.php, method UrlParser::isVideoUrl()
Checks if a URL corresponds to a known playback page of a video sharing site
- isVisited
- in file recipe_plugin.php, method Vertex::isVisited()
- isVowel
- in file tokenizer.php, method ItStemmer::isVowel()
Checks if a character is a vowel or not
- IS_DOC
- in file crawl_constants.php, class constant CrawlConstants::IS_DOC
- IS_FEED
- in file crawl_constants.php, class constant CrawlConstants::IS_FEED
- ItStemmer
- in file tokenizer.php, class ItStemmer
Italian specific tokenization code. Typically, tokenizer.php either contains a stemmer for the language in question or it specifies how many characters in a char gram
- ItStemmerTest
- in file it_stemmer_test.php, class ItStemmerTest
My code for testing the Italian stemming algorithm. The inputs for the
- indexing_plugin.php
- procedural page indexing_plugin.php
- index_archive_bundle.php
- procedural page index_archive_bundle.php
- index_bundle_iterator.php
- procedural page index_bundle_iterator.php
- intersect_iterator.php
- procedural page intersect_iterator.php
- index_dictionary.php
- procedural page index_dictionary.php
- index_manager.php
- procedural page index_manager.php
- index_shard.php
- procedural page index_shard.php
- image_processor.php
- procedural page image_processor.php
- index.php
- procedural page index.php
- index_shard_test.php
- procedural page index_shard_test.php
- it_stemmer_test.php
- procedural page it_stemmer_test.php
- images_helper.php
- procedural page images_helper.php
top
j
- $j
- in file tokenizer.php, variable EnStemmer::$j
Index to start of the suffix of the word being considered for
- join
- in file queue_server.php, method QueueServer::join()
This is a callback method that IndexArchiveBundle will periodically call when it processes a method that take a long time. This allows for instance continued processing of index data while say a dictionary merge is being performed.
- join
- in file join.php, method Join::join()
A callback function which will be invoked periodically by a method of another object that runs a long time.
- Join
- in file join.php, class Join
Marker interface used to say that a class has supports a join()
- JpgProcessor
- in file jpg_processor.php, class JpgProcessor
Used to create crawl summary information for JPEG files
- JUST_METAS
- in file crawl_constants.php, class constant CrawlConstants::JUST_METAS
- join.php
- procedural page join.php
- jpg_processor.php
- procedural page jpg_processor.php
top
top
l
- $last_flattened_words_count
- in file index_shard.php, variable IndexShard::$last_flattened_words_count
Number of document inserts since the last time word data was flattened to the word_postings string.
- $last_index_save_time
- in file queue_server.php, variable QueueServer::$last_index_save_time
Last time index was saved to disk
- $last_notify
- in file mirror.php, variable Mirror::$last_notify
Last time the machine being mirrored was notified mirror.php is still
- $last_offset
- in file word_iterator.php, variable WordIterator::$last_offset
Last Offset of word occurence in the IndexShard
- $last_offset
- in file doc_iterator.php, variable DocIterator::$last_offset
Last Offset of a doc occurence in the IndexShard
- $last_sync
- in file mirror.php, variable Mirror::$last_sync
Last time a sync list was obtained from master machines
- $last_sync_file
- in file mirror.php, variable Mirror::$last_sync_file
File name where last sync time is written
- $layout
- in file rss_view.php, variable RssView::$layout
This view is drawn on a web layout
- $layout
- in file admin_view.php, variable AdminView::$layout
This view is drawn on a web layout
- $layout
- in file search_view.php, variable SearchView::$layout
This view is drawn on a web layout
- $layout
- in file fetch_view.php, variable FetchView::$layout
No layout is used for this view
- $layout
- in file nocache_view.php, variable NocacheView::$layout
This view is drawn on a web layout
- $layout
- in file settings_view.php, variable SettingsView::$layout
This view is drawn on a web layout
- $layout
- in file view.php, variable View::$layout
The name of the type of layout object that the view is drawn on
- $layout
- in file static_view.php, variable StaticView::$layout
This view is drawn on a web layout
- $layout
- in file statistics_view.php, variable StatisticsView::$layout
This view is drawn on a web layout
- $layout
- in file signin_view.php, variable SigninView::$layout
This view is drawn on a web layout
- $layout_object
- in file view.php, variable View::$layout_object
The reference to the layout object that the view is drawn on
- $len_all_docs
- in file index_shard.php, variable IndexShard::$len_all_docs
Number of words stored in total in all documents in this shard
- $len_all_link_docs
- in file index_shard.php, variable IndexShard::$len_all_link_docs
Number of words stored in total in all links in this shard
- $limit
- in file database_bundle_iterator.php, variable DatabaseBundleIterator::$limit
Current result row of query iterator has processed to
- $limit
- in file mix_archive_bundle_iterator.php, variable MixArchiveBundleIterator::$limit
count of how far our into the crawl mix we've gone.
- $limit
- in file network_iterator.php, variable NetworkIterator::$limit
Current limit number to be added to base query
- $locale_name
- in file locale_model.php, variable LocaleModel::$locale_name
Locale name as a string it locale name's language
- $locale_tag
- in file locale_model.php, variable LocaleModel::$locale_tag
IANA tag name of current locale
- lang
- in file html_processor.php, method HtmlProcessor::lang()
Determines the language of the html document by looking at the root language attribute. If that fails $sample_text is used to try to guess the language
- LANG
- in file crawl_constants.php, class constant CrawlConstants::LANG
- lang
- in file pptx_processor.php, method PptxProcessor::lang()
Determines the language of the xml document by looking at the language attribute of a tag.
- lang
- in file rss_processor.php, method RssProcessor::lang()
Determines the language of the rss document by looking at the channel language tag
- LanguageElement
- in file language_element.php, class LanguageElement
Element used to display available languages in the settings view
- languageTestCase
- in file xlsx_processor_test.php, method XlsxProcessorTest::languageTestCase()
Tests that the language is correct
- Layout
- in file layout.php, class Layout
Base layout Class. Layouts are used to render the headers and footer of the page on which a View lives
- lcfirst
- in file index.php, function lcfirst()
Lower cases the first letter in a string
- lessThan
- in file utility.php, function lessThan()
Callback to check if $a is less than $b
- lessThanLocale
- in file locale_model.php, function lessThanLocale()
Function for comparing two locale arrays by locale tag so can sort
- locale_functions.php
- procedural page locale_functions.php
- linkAndTexts
- in file rss_processor.php, method RssProcessor::linkAndTexts()
Returns a url text pair where the url comes from the link of the given item node and the text comes from the text data for that node.
- links
- in file html_processor.php, method HtmlProcessor::links()
Returns up to MAX_LINKS_TO_EXTRACT many links from the supplied dom object where links have been canonicalized according to the supplied $site information.
- links
- in file rss_processor.php, method RssProcessor::links()
Returns up to MAX_LINK_PER_PAGE many links from the supplied dom object where links have been canonicalized according to the supplied $site information.
- links
- in file pptx_processor.php, method PptxProcessor::links()
Returns up to MAX_LINK_PER_PAGE many links from the supplied dom object where links have been canonicalized according to the supplied $site information.
- LINKS
- in file crawl_constants.php, class constant CrawlConstants::LINKS
- links
- in file sitemap_processor.php, method SitemapProcessor::links()
Returns links from the supplied dom object of a sitemap
- links
- in file xlsx_processor.php, method XlsxProcessor::links()
Returns up to MAX_LINK_PER_PAGE many links from the supplied dom object where links have been canonicalized according to the supplied $site information.
- linksTestCase
- in file xlsx_processor_test.php, method XlsxProcessorTest::linksTestCase()
Tests that the links are correct
- linksToHtml
- in file odp_rdf_bundle_iterator.php, method OdpRdfArchiveBundleIterator::linksToHtml()
Makes an unordered HTML list out of an associative array of url => link_text pairs.
- LINK_FLAG
- in file index_shard.php, class constant IndexShard::LINK_FLAG
Used to keep track of whether a record in document infos is for a
- LINK_LENGTH
- in file crawl_constants.php, class constant CrawlConstants::LINK_LENGTH
- LINK_SEEN_URLS
- in file crawl_constants.php, class constant CrawlConstants::LINK_SEEN_URLS
- LINK_WEIGHT
- in file config.php, constant LINK_WEIGHT
BM25F weight for other text within links to a doc
- LINK_WORDS
- in file crawl_constants.php, class constant CrawlConstants::LINK_WORDS
- LINK_WORD_SCORE
- in file crawl_constants.php, class constant CrawlConstants::LINK_WORD_SCORE
- listTests
- in file index.php, function listTests()
This function is responsible for listing out HTML links to the available
- load
- in file string_array.php, method StringArray::load()
Load a StringArray from a file
- load
- in file persistent_structure.php, method PersistentStructure::load()
Load a PersistentStructure from a file
- load
- in file index_shard.php, method IndexShard::load()
Load an IndexShard from a file or string
- loadCronTable
- in file cron_model.php, method CronModel::loadCronTable()
Loads into $this->cron_table the associative array of key =>timestamps
- loadMetaData
- in file bloom_filter_bundle.php, method BloomFilterBundle::loadMetaData()
Loads from the filter bundles' meta.txt the meta data associated with
- LocaleModel
- in file locale_model.php, class LocaleModel
Used to encapsulate information about a locale (data about a language in a given region).
- LOCALE_DIR
- in file config.php, constant LOCALE_DIR
- location
- in file html_processor.php, method HtmlProcessor::location()
- LOCATION
- in file crawl_constants.php, class constant CrawlConstants::LOCATION
- log
- in file machine_controller.php, method MachineController::log()
Used to retrieve a fetcher/queue_server logfile for the the current
- LOGGING
- in file crawl_constants.php, class constant CrawlConstants::LOGGING
- loginDbms
- in file model.php, method Model::loginDbms()
Returns whether the provided dbms needs a login and password or not (sqlite or sqlite3)
- LOG_DIR
- in file config.php, constant LOG_DIR
- LOG_LISTING_LEN
- in file machine_controller.php, class constant MachineController::LOG_LISTING_LEN
Number of characters from end of most recent log file to return
- LOG_TO_FILES
- in file arc_tool.php, constant LOG_TO_FILES
This tool does not need logging
- longlines
- in file code_tool.php, function longlines()
Search and echos line numbers and lines for lines of length greater than 80 characters in files in supplied sub-folder/file,
- lookup
- in file hash_table.php, method HashTable::lookup()
Tries to lookup the key in the hash table either return the location where it was found or the value associated with the key.
- lookupArray
- in file hash_table.php, method HashTable::lookupArray()
Tries to lookup the key in the hash table either return the location where it was found or the value associated with the key.
- lookupDoc
- in file group_iterator.php, method GroupIterator::lookupDoc()
Looks up a doc for a link doc_key, so can get its summary info
- lookupHashTable
- in file web_queue_bundle.php, method WebQueueBundle::lookupHashTable()
Looks up $key in the to-crawl hash table
- lookupSummaryOffsetGeneration
- in file parallel_model.php, method ParallelModel::lookupSummaryOffsetGeneration()
Determines the offset into the summaries WebArchiveBundle and generation of the provided url (or hash_url) so that the info:url (info:base64_hash_url) summary can be retrieved. This assumes of course that the info:url meta word has been stored.
- lookupTranslation
- in file locale_model.php, method LocaleModel::lookupTranslation()
Translates a string_id from among translation array data in $new_configure (most preferred, probably come from recent web form data), $old_configure (probably from work dir), and $fallback_configure (probably from base dir of Yioop instance, least preferred).
- loop
- in file fetcher.php, method Fetcher::loop()
Main loop for the fetcher.
- loop
- in file queue_server.php, method QueueServer::loop()
Main runtime loop of the queue_server.
- loop
- in file news_updater.php, method NewsUpdater::loop()
Main loop for the news updater.
- loop
- in file mirror.php, method Mirror::loop()
Main loop for the mirror script.
- loop
- in file configure_tool.php, method ConfigureTool::loop()
This is the main loop where options of what the user can configure are presented, a choice is requested, and so on...
- locale_model.php
- procedural page locale_model.php
- language_element.php
- procedural page language_element.php
- layout.php
- procedural page layout.php
top
m
- $machine_urls
- in file statistics_controller.php, variable StatisticsController::$machine_urls
Machines (string urls) which may have portions of the web crawl
- $max_hash_ops_before_rebuild
- in file web_queue_bundle.php, variable WebQueueBundle::$max_hash_ops_before_rebuild
Number of non-read operations on the hash table before it needs to be rebuilt.
- $max_suffix_pos
- in file tokenizer.php, variable ItStemmer::$max_suffix_pos
Storage for computing the starting position for the longest suffix
- $max_tier
- in file index_dictionary.php, variable IndexDictionary::$max_tier
The highest tiered index in the IndexDictionary
- $menu
- in file configure_tool.php, variable ConfigureTool::$menu
Holds the main menu data for the configuration tool
- $meta_words_list
- in file phrase_model.php, variable PhraseModel::$meta_words_list
A list of meta words that might be extracted from a query
- $min_or_max
- in file web_queue_bundle.php, variable WebQueueBundle::$min_or_max
whether polling the first element of the priority queue returns the
- $min_or_max
- in file priority_queue.php, variable PriorityQueue::$min_or_max
When the polling the queue returns the least or most weighted value
- $mix_timestamp
- in file mix_archive_bundle_iterator.php, variable MixArchiveBundleIterator::$mix_timestamp
Used to hold timestamp of the crawl mix being used to iterate over
- $models
- in file search_controller.php, variable SearchController::$models
Says which models to load for this controller.
- $models
- in file controller.php, variable Controller::$models
Array of the model classes used by this controller
- $models
- in file admin_controller.php, variable AdminController::$models
Says which models to load for this controller.
- $models
- in file crawl_controller.php, variable CrawlController::$models
No models used by this controller
- $models
- in file fetch_controller.php, variable FetchController::$models
No models used by this controller
- $models
- in file settings_controller.php, variable SettingsController::$models
LocaleModel used to get the available languages/locales, CrawlModel
- $models
- in file machine_controller.php, variable MachineController::$models
No models used by this controller
- $models
- in file resource_controller.php, variable ResourceController::$models
No models used by this controller
- $models
- in file statistics_controller.php, variable StatisticsController::$models
No models used by this controller
- $models
- in file archive_controller.php, variable ArchiveController::$models
This controller does not make use of any models
- $models
- in file static_controller.php, variable StaticController::$models
Says which models to load for this controller.
- $more_results
- in file network_iterator.php, variable NetworkIterator::$more_results
Flags for each server saying if there are more results for that server or not
- $most_recent_fetcher
- in file queue_server.php, variable QueueServer::$most_recent_fetcher
IP address as a string of the fetcher that most recently spoke with the queue_server.
- mirror.php
- procedural page mirror.php
- machine_controller.php
- procedural page machine_controller.php
- mediawiki_bundle_iterator.php
- procedural page mediawiki_bundle_iterator.php
- mix_archive_bundle_iterator.php
- procedural page mix_archive_bundle_iterator.php
- MACHINE
- in file crawl_constants.php, class constant CrawlConstants::MACHINE
- MachineController
- in file machine_controller.php, class MachineController
This class handles requests from a computer that is managing several fetchers and queue_servers. This controller might be used to start, stop fetchers/queue_server as well as get status on the active fetchers
- MachinelogElement
- in file machinelog_element.php, class MachinelogElement
Element responsible for displaying the queue_server or fetcher log of a machine
- MachineModel
- in file machine_model.php, class MachineModel
This is class is used to handle db results related to Machine Administration
- machineStatus
- in file admin_controller.php, method AdminController::machineStatus()
Gets data from the machineModel concerning the on/off states of the machines managed by this Yioop instance and then passes this data the the machinestatus view.
- MachinestatusView
- in file machinestatus_view.php, class MachinestatusView
This view is used to display information about the on/off state of the queue_servers and fetchers managed by this instance of Yioop.
- MACHINE_ID
- in file crawl_constants.php, class constant CrawlConstants::MACHINE_ID
- MACHINE_URI
- in file crawl_constants.php, class constant CrawlConstants::MACHINE_URI
- MAGIC
- in file bzip2_block_iterator.php, class constant BZip2BlockIterator::MAGIC
String to tell if file is a bz2 file
- main
- in file bzip2_block_iterator.php, function main()
Command-line shell for testing the class
- MAINTENANCE_MODE
- in file config.php, constant MAINTENANCE_MODE
Maintenance mode restricts access to local machine
- makeBuffer
- in file text_archive_bundle_iterator.php, method TextArchiveBundleIterator::makeBuffer()
Reads in block $this->buffer_block_num of size self::BUFFER_SIZE from the archive file
- makeCanonicalRobotPath
- in file robot_processor.php, method RobotProcessor::makeCanonicalRobotPath()
For robot paths
- makeItem
- in file index_shard.php, method IndexShard::makeItem()
Return (docid, item) where item has document statistics (summary offset, relevance, doc rank, and score) for the document give by the supplied posting, based on the the posting lists num docs with word, and the number of occurrences of the word in the doc.
- makeLookupLink
- in file network_iterator.php, method NetworkIterator::makeLookupLink()
Called to make an link for AnalyticsManager about a network query performed by this iterator.
- makeMediaGroups
- in file search_controller.php, method SearchController::makeMediaGroups()
Groups search result pages together which have thumbnails
- makeNWordGramsFiles
- in file token_tool.php, function makeNWordGramsFiles()
Makes an n or all word gram Bloom filter based on the supplied arguments Wikipedia files are assumed to have been place in the PREP_DIR before this is run and writes it into the resources folder of the given locale
- makeNWordGramsFilterFile
- in file nword_grams.php, method NWordGrams::makeNWordGramsFilterFile()
Creates a bloom filter file from a n word gram text file. The path of n word gram text file used is based on the input $lang.
- makeNWordGramsTextFile
- in file nword_grams.php, method NWordGrams::makeNWordGramsTextFile()
Generates a n word grams text file from input wikipedia xml file.
- makePrefixLetters
- in file index_dictionary.php, method IndexDictionary::makePrefixLetters()
Makes dictionary sub-directories for each of the 256 possible first hash characters that crawHash in raw mode code output.
- makePrefixRecord
- in file index_dictionary.php, method IndexDictionary::makePrefixRecord()
Makes a prefix record string out of an offset and count (packs and concatenates).
- makeReferences
- in file mediawiki_bundle_iterator.php, method MediaWikiArchiveBundleIterator::makeReferences()
Used to make a reference list for a wiki page based on the cite tags on that page.
- makeSuggestTrie
- in file token_tool.php, function makeSuggestTrie()
Makes a trie that can be used to make word suggestions as someone enters terms into the Yioop! search box. Outputs the result into the file suggest_trie.txt.gz in the supplied locale dir
- makeTableCallback
- in file mediawiki_bundle_iterator.php, function makeTableCallback()
Callback used by a preg_replace_callback in nextPage to make a table
- makeTableOfContents
- in file mediawiki_bundle_iterator.php, method MediaWikiArchiveBundleIterator::makeTableOfContents()
Used to make a table of contents for a wiki page based on the level two headings on that page.
- makeWords
- in file index_shard.php, method IndexShard::makeWords()
Callback function for load method. splits a word_key . word_info string into an entry in the passed shard $shard->words[word_key] = $word_info.
- makeWorkDirectory
- in file profile_model.php, method ProfileModel::makeWorkDirectory()
Creates a folder to be used to maintain local information about this instance of the Yioop/SeekQuarry engine
- manageAccount
- in file admin_controller.php, method AdminController::manageAccount()
Used to handle the change current user password admin activity
- ManageaccountElement
- in file manageaccount_element.php, class ManageaccountElement
Element responsible for displaying the user account features that someone can modify for their own SeekQuarry/Yioop account.
- manageCrawls
- in file admin_controller.php, method AdminController::manageCrawls()
Used to handle the manage crawl activity.
- ManagecrawlsElement
- in file managecrawls_element.php, class ManagecrawlsElement
Element responsible for displaying info about starting, stopping, deleting, and using a crawl. It makes use of the CrawlStatusView
- manageLocales
- in file admin_controller.php, method AdminController::manageLocales()
Handles admin request related to the manage locale activity
- ManagelocalesElement
- in file managelocales_element.php, class ManagelocalesElement
This Element is responsible for drawing screens in the Admin View related to localization. Namely, the ability to create, delete, and text writing mode for locales as well as the ability to modify translations within a locale.
- manageMachines
- in file admin_controller.php, method AdminController::manageMachines()
Handles admin request related to the managing the machines which perform crawls
- ManagemachinesElement
- in file managemachines_element.php, class ManagemachinesElement
Used to draw the admin screen on which admin users can add/delete and manage machines which might act as fetchers or queue_servers.
- manageRoles
- in file admin_controller.php, method AdminController::manageRoles()
Used to handle the manage role activity.
- ManagerolesElement
- in file manageroles_element.php, class ManagerolesElement
Used to draw the admin screen on which admin users can create roles, delete roles and add and delete roles from users
- manageUsers
- in file admin_controller.php, method AdminController::manageUsers()
Used to handle the manage user activity.
- ManageusersElement
- in file manageusers_element.php, class ManageusersElement
Element responsible for drawing the activity screen for User manipulation in the AdminView.
- mapPath
- in file code_tool.php, function mapPath()
Applies the function $callback to each file in $path
- markChildren
- in file search_controller.php, method SearchController::markChildren()
Used in rendering a cached web page to highlight the search terms.
- matchDefine
- in file profile_model.php, method ProfileModel::matchDefine()
Finds the first occurrence of define('$defined', something) in $string and returns something
- MAX
- in file crawl_constants.php, class constant CrawlConstants::MAX
- MAXIMUM_CRAWL_DELAY
- in file config.php, constant MAXIMUM_CRAWL_DELAY
if the robots.txt has a Crawl-delay larger than this value don't crawl the site.
- maxQueueTestCase
- in file priority_queue_test.php, method PriorityQueueTest::maxQueueTestCase()
Insert five items into a priority queue. Checks that the resulting heap
- maxSuffix
- in file tokenizer.php, method ItStemmer::maxSuffix()
Computes the longest suffix for a given string from a given set of suffixes
- MAX_ARCHIVE_OBJECT_SIZE
- in file config.php, constant MAX_ARCHIVE_OBJECT_SIZE
largest sized object allowedin a web archive (used to sanity check
- MAX_BUFFER_DOCS
- in file arc_tool.php, class constant ArcTool::MAX_BUFFER_DOCS
The maximum number of documents the arc_tool list function will read into memory in one go.
- MAX_COPY_TRIES
- in file source_model.php, class constant SourceModel::MAX_COPY_TRIES
Maximum number of tries to completely copy over old shard on delete
- MAX_DESCRIPTION_LEN
- in file text_processor.php, class constant TextProcessor::MAX_DESCRIPTION_LEN
Max number of chars to extract for description
- MAX_DESCRIPTION_LEN
- in file pptx_processor.php, class constant PptxProcessor::MAX_DESCRIPTION_LEN
Constant for maximum description length
- MAX_DESCRIPTION_LEN
- in file xml_processor.php, class constant XmlProcessor::MAX_DESCRIPTION_LEN
- MAX_DESCRIPTION_LEN
- in file svg_processor.php, class constant SvgProcessor::MAX_DESCRIPTION_LEN
- MAX_DESCRIPTION_LEN
- in file html_processor.php, class constant HtmlProcessor::MAX_DESCRIPTION_LEN
- MAX_DESCRIPTION_LEN
- in file xlsx_processor.php, class constant XlsxProcessor::MAX_DESCRIPTION_LEN
- MAX_DESCRIPTION_LEN
- in file rss_processor.php, class constant RssProcessor::MAX_DESCRIPTION_LEN
Max number of chars to extract for description
- MAX_DIM
- in file bmp_processor.php, class constant BmpProcessor::MAX_DIM
Maximum pixel width or height
- MAX_EXECUTION_TIME
- in file source_model.php, class constant SourceModel::MAX_EXECUTION_TIME
Maximum length of time update/delete news scripts can run in seconds
- MAX_FEEDS_ONE_GO
- in file source_model.php, class constant SourceModel::MAX_FEEDS_ONE_GO
Mamimum number of feeds to download in one try
- MAX_FETCH_SIZE
- in file config.php, constant MAX_FETCH_SIZE
maximum number of urls to schedule to a given fetcher in one go
- MAX_LINKS_PER_PAGE
- in file config.php, constant MAX_LINKS_PER_PAGE
maximum number of links to keep after initial extraction
- MAX_LINKS_PER_SITEMAP
- in file config.php, constant MAX_LINKS_PER_SITEMAP
maximum number of links to consider from a sitemap page
- MAX_LINKS_TO_EXTRACT
- in file config.php, constant MAX_LINKS_TO_EXTRACT
maximum number of links to extract from a page on an initial pass
- MAX_LINKS_WORD_TEXT
- in file config.php, constant MAX_LINKS_WORD_TEXT
maximum number of words from links to consider on any given page
- MAX_LOG_FILE_SIZE
- in file config.php, constant MAX_LOG_FILE_SIZE
maximum size of a log file before it is rotated
- MAX_PAGES_TO_SHOW
- in file pagination_helper.php, class constant PaginationHelper::MAX_PAGES_TO_SHOW
The maximum numbered links to pages to show besides the next and
- MAX_PHRASE_LEN
- in file config.php, constant MAX_PHRASE_LEN
maximum length +1 exact phrase matches
- MAX_QUERY_TERMS
- in file config.php, constant MAX_QUERY_TERMS
maximum number of terms allowed in a conjunctive search query
- MAX_RECORD_SIZE
- in file text_archive_bundle_iterator.php, class constant TextArchiveBundleIterator::MAX_RECORD_SIZE
Estimate of the maximum size of a record stored in a text archive
- MAX_THUMB_LEN
- in file svg_processor.php, class constant SvgProcessor::MAX_THUMB_LEN
- MAX_TITLE_LEN
- in file html_processor.php, class constant HtmlProcessor::MAX_TITLE_LEN
- MAX_TITLE_LENGTH
- in file model.php, constant MAX_TITLE_LENGTH
- max_url_archive_offset
- in file web_queue_bundle.php, class constant WebQueueBundle::max_url_archive_offset
The largest offset for the url WebArchive before we rebuild it.
- MAX_URL_LENGTH
- in file config.php, constant MAX_URL_LENGTH
maximum length of urls to try to queue, this is important for
- MAX_WAITING_HOSTS
- in file config.php, constant MAX_WAITING_HOSTS
maximum number of active crawl-delayed hosts
- MediaWikiArchiveBundleIterator
- in file mediawiki_bundle_iterator.php, class MediaWikiArchiveBundleIterator
Used to iterate through a collection of .xml.bz2 media wiki files
- MEMORY_USAGE
- in file crawl_constants.php, class constant CrawlConstants::MEMORY_USAGE
- mergeAllTiers
- in file index_dictionary.php, method IndexDictionary::mergeAllTiers()
Merges for each tier and for each first letter subdirectory,
- mergeTier
- in file index_dictionary.php, method IndexDictionary::mergeTier()
Merges for each first letter subdirectory, the $tier pair of files of dictinary words. The output is stored in $out_slot.
- mergeTierFiles
- in file index_dictionary.php, method IndexDictionary::mergeTierFiles()
For a fixed prefix directory merges the $tier pair of files of dictinary words. The output is stored in $out_slot.
- mergeWordPostingsToString
- in file index_shard.php, method IndexShard::mergeWordPostingsToString()
Used to flatten the words associative array to a more memory efficient word_postings string.
- META_WORDS
- in file crawl_constants.php, class constant CrawlConstants::META_WORDS
- metricToInt
- in file utility.php, function metricToInt()
Converts a string of the form some int followed by K, M, or G.
- migrateDatabaseIfNecessary
- in file profile_model.php, method ProfileModel::migrateDatabaseIfNecessary()
Check if $dbinfo provided the connection details for a Yioop/SeekQuarry database. If it does provide a valid db connection but no data then try to recreate the database from the default copy stored in /data dir.
- MIN
- in file crawl_constants.php, class constant CrawlConstants::MIN
- MINIMUM_FETCH_LOOP_TIME
- in file config.php, constant MINIMUM_FETCH_LOOP_TIME
fetcher must wait at least this long between multi-curl requests
- MINIMUM_UPDATE_LOOP_TIME
- in file news_updater.php, constant MINIMUM_UPDATE_LOOP_TIME
Shortest time through one iteration of news updater's loop
- minQueueTestCase
- in file priority_queue_test.php, method PriorityQueueTest::minQueueTestCase()
Inserts five elements inserted into a minimum priority queue. The resulting heap array is compared to expected. Then repeated polling is done to make sure the objects come out in the correct order.
- MIN_DESCRIPTION_LENGTH
- in file parallel_model.php, class constant ParallelModel::MIN_DESCRIPTION_LENGTH
the minimum length of a description before we stop appending
- MIN_DESCRIPTION_LENGTH
- in file group_iterator.php, class constant GroupIterator::MIN_DESCRIPTION_LENGTH
the minimum length of a description before we stop appending
- MIN_FIND_RESULTS_PER_BLOCK
- in file group_iterator.php, class constant GroupIterator::MIN_FIND_RESULTS_PER_BLOCK
the minimum number of pages to group from a block;
- MIN_FIND_RESULTS_PER_BLOCK
- in file network_iterator.php, class constant NetworkIterator::MIN_FIND_RESULTS_PER_BLOCK
the minimum number of pages to group from a block;
- MIN_QUEUE_WEIGHT
- in file config.php, constant MIN_QUEUE_WEIGHT
Minimum weight in priority queue before rebuilt
- MIN_RESULTS_TO_GROUP
- in file config.php, constant MIN_RESULTS_TO_GROUP
If that many exist, the minimum number of results to get
- MIN_SNIPPET_LENGTH
- in file model.php, constant MIN_SNIPPET_LENGTH
- Mirror
- in file mirror.php, class Mirror
This class is responsible for syncing crawl archives between machines using the SeekQuarry/Yioop search engine
- mirrorHandle
- in file search_controller.php, method SearchController::mirrorHandle()
Used to check if there are any mirrors of the current server.
- MIRROR_NOTIFY_FREQUENCY
- in file config.php, constant MIRROR_NOTIFY_FREQUENCY
How often mirror script tries to notify machine it is mirroring that it
- MIRROR_SYNC_FREQUENCY
- in file config.php, constant MIRROR_SYNC_FREQUENCY
How often mirror script tries to synchronize with machine it is mirroring
- mirror_table_name
- in file crawl_constants.php, class constant CrawlConstants::mirror_table_name
- MixArchiveBundleIterator
- in file mix_archive_bundle_iterator.php, class MixArchiveBundleIterator
Used to do an archive crawl based on the results of a crawl mix.
- mixCrawls
- in file admin_controller.php, method AdminController::mixCrawls()
Handles admin request related to the crawl mix activity
- MixcrawlsElement
- in file mixcrawls_element.php, class MixcrawlsElement
Element responsible for displaying info to allow a user to create a crawl mix or edit an existing one
- MOBILE
- in file config.php, constant MOBILE
- Model
- in file model.php, class Model
This is a base class for all models in the SeekQuarry search engine. It provides support functions for formatting search results
- mysql_manager.php
- procedural page mysql_manager.php
- machine_model.php
- procedural page machine_model.php
- model.php
- procedural page model.php
- MODIFIED
- in file crawl_constants.php, class constant CrawlConstants::MODIFIED
- MysqlManager
- in file mysql_manager.php, class MysqlManager
Mysql DatasourceManager
- machinelog_element.php
- procedural page machinelog_element.php
- manageaccount_element.php
- procedural page manageaccount_element.php
- managecrawls_element.php
- procedural page managecrawls_element.php
- managelocales_element.php
- procedural page managelocales_element.php
- managemachines_element.php
- procedural page managemachines_element.php
- manageroles_element.php
- procedural page manageroles_element.php
- manageusers_element.php
- procedural page manageusers_element.php
- mixcrawls_element.php
- procedural page mixcrawls_element.php
- machinestatus_view.php
- procedural page machinestatus_view.php
top
n
- $name
- in file crawl_daemon.php, variable CrawlDaemon::$name
Name prefix to be used on files associated with this daemon
- $name
- in file epub_processor.php, variable EpubProcessor::$name
The name of the tag element in an xml document
- $name_server
- in file mirror.php, variable Mirror::$name_server
Url or IP address of the name_server to get sites to crawl from
- $name_server
- in file fetcher.php, variable Fetcher::$name_server
Urls or IP address of the web_server used to administer this instance of yioop. Used to figure out available queue_servers to contact for crawling data
- $network_flag
- in file group_iterator.php, variable GroupIterator::$network_flag
Whether the iterator is being used for a network query
- $next_offset
- in file doc_iterator.php, variable DocIterator::$next_offset
The next byte offset of a doc in the IndexShard
- $next_offset
- in file word_iterator.php, variable WordIterator::$next_offset
The next byte offset in the IndexShard
- $ngrams
- in file nword_grams.php, variable NWordGrams::$ngrams
Static copy of n-grams files
- $notifier
- in file priority_queue.php, variable PriorityQueue::$notifier
An object that implements the Notifier interface (for instance,
- $no_process_links
- in file fetcher.php, variable Fetcher::$no_process_links
When processing recrawl data this says to assume the data has already had its inks extracted into a field and so this doesn't have to be done in a separate step
- $no_stem_list
- in file tokenizer.php, variable EnStemmer::$no_stem_list
- $null
- in file hash_table.php, variable HashTable::$null
Holds an all \0 string used of length $this->key_size
- $num_affected
- in file pdo_manager.php, variable PdoManager::$num_affected
The number of rows affected by the last exec
- $num_docs
- in file index_shard.php, variable IndexShard::$num_docs
Number of documents (not links) stored in this shard
- $num_docs
- in file index_bundle_iterator.php, variable IndexBundleIterator::$num_docs
Estimate of the number of documents that this iterator can return
- $num_docs_per_generation
- in file index_archive_bundle.php, variable IndexArchiveBundle::$num_docs_per_generation
Number of docs before a new generation is started
- $num_docs_per_generation
- in file index_shard.php, variable IndexShard::$num_docs_per_generation
This is supposed to hold the number of documents that a given shard can hold.
- $num_extra_bits
- in file bzip2_block_iterator.php, variable BZip2BlockIterator::$num_extra_bits
Store how many left-over bits there are
- $num_filters
- in file bloom_filter_bundle.php, variable BloomFilterBundle::$num_filters
Total number of filter that this filter bundle currently has
- $num_generations
- in file word_iterator.php, variable WordIterator::$num_generations
The total number of shards that have data for this word
- $num_generations
- in file doc_iterator.php, variable DocIterator::$num_generations
The total number of shards that have data for this word
- $num_iterators
- in file negation_iterator.php, variable NegationIterator::$num_iterators
Number of elements in $this->index_bundle_iterators
- $num_iterators
- in file intersect_iterator.php, variable IntersectIterator::$num_iterators
Number of elements in $this->index_bundle_iterators
- $num_iterators
- in file union_iterator.php, variable UnionIterator::$num_iterators
Number of elements in $this->index_bundle_iterators
- $num_keys
- in file bloom_filter_file.php, variable BloomFilterFile::$num_keys
Number of bit positions in the Bloom filter used to say an item is
- $num_link_docs
- in file index_shard.php, variable IndexShard::$num_link_docs
Number of links (not documents) stored in this shard
- $num_partitions
- in file text_archive_bundle_iterator.php, variable TextArchiveBundleIterator::$num_partitions
The number of arc files in this arc archive bundle
- $num_partitions
- in file web_archive_bundle_iterator.php, variable WebArchiveBundleIterator::$num_partitions
Number of web archive objects in this web archive bundle
- $num_partitions_summaries
- in file index_archive_bundle.php, variable IndexArchiveBundle::$num_partitions_summaries
Number of partitions in the summaries WebArchiveBundle
- $num_seen_sites
- in file fetcher.php, variable Fetcher::$num_seen_sites
Number of sites crawled in the current crawl
- $num_urls_ram
- in file web_queue_bundle.php, variable WebQueueBundle::$num_urls_ram
number of entries the priority queue used by this web queue bundle
- $num_values
- in file string_array.php, variable StringArray::$num_values
Number of items to be stored in the StringArray
- $num_values
- in file priority_queue.php, variable PriorityQueue::$num_values
Number of values that can be stored in the priority queue
- $num_words
- in file intersect_iterator.php, variable IntersectIterator::$num_words
Number of elements in $this->word_iterator_map
- news_updater.php
- procedural page news_updater.php
- non_compressor.php
- procedural page non_compressor.php
- negation_iterator.php
- procedural page negation_iterator.php
- network_iterator.php
- procedural page network_iterator.php
- notifier.php
- procedural page notifier.php
- nword_grams.php
- procedural page nword_grams.php
- nameServer
- in file configure_tool.php, method ConfigureTool::nameServer()
Configures settings relating to the location of the name server and the salt used when communicating with it. Also, configures caching mechanisms the name server should use when returning results.
- name_archive_iterator
- in file crawl_constants.php, class constant CrawlConstants::name_archive_iterator
- NAME_SERVER
- in file config.php, constant NAME_SERVER
- NEEDS_OFFSET_FLAG
- in file crawl_constants.php, class constant CrawlConstants::NEEDS_OFFSET_FLAG
- NegationIterator
- in file negation_iterator.php, class NegationIterator
Used to iterate over the documents which dont' occur in a set of iterator results
- networkGetCrawlItems
- in file parallel_model.php, method ParallelModel::networkGetCrawlItems()
In a multiple queue server setting, gets summaries for a set of document
- NetworkIterator
- in file network_iterator.php, class NetworkIterator
This iterator is used to handle querying a network of queue_servers with regard to a query
- network_base_name
- in file crawl_constants.php, class constant CrawlConstants::network_base_name
- network_crawllist_base_name
- in file crawl_constants.php, class constant CrawlConstants::network_crawllist_base_name
- newsUpdate
- in file search_controller.php, method SearchController::newsUpdate()
If news_update time has passed, then updates news feeds associated with this Yioop instance
- NewsUpdater
- in file news_updater.php, class NewsUpdater
Separate process/command-line script which can be used to update news sources for Yioop. This is as an alternative to using the web app for updating. Makes use of the web-apps code.
- NEWS_MODE
- in file config.php, constant NEWS_MODE
- NEW_CRAWL
- in file crawl_constants.php, class constant CrawlConstants::NEW_CRAWL
- nextBlock
- in file bzip2_block_iterator.php, method BZip2BlockIterator::nextBlock()
Extracts the next bz2 block from the bzip2 file this iterator works
- nextChunk
- in file text_archive_bundle_iterator.php, method TextArchiveBundleIterator::nextChunk()
Called to get the next chunk of BUFFER_SIZE + 2 MAX_RECORD_SIZE bytes
- nextDocsWithWord
- in file index_bundle_iterator.php, method IndexBundleIterator::nextDocsWithWord()
Get the current block of doc summaries for the word iterator and advances the current pointer to the next block of documents. If a doc index is the next block must be of docs after this doc_index
- nextObjects
- in file web_archive.php, method WebArchive::nextObjects()
Returns $num many objects from the web archive starting at the current iterator position. The iterator is advance to the object after the last one returned
- nextPage
- in file warc_archive_bundle_iterator.php, method WarcArchiveBundleIterator::nextPage()
Gets the next doc from the iterator
- nextPage
- in file text_archive_bundle_iterator.php, method TextArchiveBundleIterator::nextPage()
Gets the next doc from the iterator
- nextPage
- in file odp_rdf_bundle_iterator.php, method OdpRdfArchiveBundleIterator::nextPage()
Gets the next doc from the iterator
- nextPage
- in file mediawiki_bundle_iterator.php, method MediaWikiArchiveBundleIterator::nextPage()
Gets the next doc from the iterator
- nextPage
- in file arc_archive_bundle_iterator.php, method ArcArchiveBundleIterator::nextPage()
Gets the next doc from the iterator
- nextPages
- in file mix_archive_bundle_iterator.php, method MixArchiveBundleIterator::nextPages()
Gets the next $num many docs from the iterator
- nextPages
- in file database_bundle_iterator.php, method DatabaseBundleIterator::nextPages()
Gets the next at most $num many docs from the iterator. It might return less than $num many documents if the partition changes or the end of the bundle is reached.
- nextPages
- in file archive_bundle_iterator.php, method ArchiveBundleIterator::nextPages()
Gets the next $num many docs from the iterator
- nextPages
- in file web_archive_bundle_iterator.php, method WebArchiveBundleIterator::nextPages()
Gets the next $num many docs from the iterator
- nextPages
- in file text_archive_bundle_iterator.php, method TextArchiveBundleIterator::nextPages()
Gets the next at most $num many docs from the iterator. It might return less than $num many documents if the partition changes or the end of the bundle is reached.
- nextPostingOffsetDocOffset
- in file index_shard.php, method IndexShard::nextPostingOffsetDocOffset()
Finds the first posting offset between $start_offset and $end_offset of a posting that has a doc_offset bigger than or equal to $doc_offset This is implemented using a galloping search (double offset till get larger than binary search).
- ngramsContains
- in file nword_grams.php, method NWordGrams::ngramsContains()
Says whether or not phrase exists in the N word gram Bloom Filter
- NocacheView
- in file nocache_view.php, class NocacheView
This view is drawn when someone clicks on the cached link of a web-page for which no cache is available
- NonCompressor
- in file non_compressor.php, class NonCompressor
Implementation of a trivial Compressor.
- nonNetworkGetCrawlItems
- in file parallel_model.php, method ParallelModel::nonNetworkGetCrawlItems()
Gets summaries on a particular machine for a set of document by
- normalize
- in file priority_queue.php, method PriorityQueue::normalize()
Scaless the weights of elements in the queue so that the sum fo the new weights is $new_total
- normalize
- in file web_queue_bundle.php, method WebQueueBundle::normalize()
Makes the weight sum of the to-crawl priority queue sum to $new_total
- NORMALIZE_FREQUENCY
- in file config.php, constant NORMALIZE_FREQUENCY
how often should we make in OPIC the sum of weights totals MAX_URLS
- Notifier
- in file notifier.php, class Notifier
A Notifier is an object which will be notified by a priority queue when the index in the queue viewed as array of some data item has been changed.
- notify
- in file web_queue_bundle.php, method WebQueueBundle::notify()
Callback which is called when an item in the priority queue changes position. The position is updated in the hash table.
- notify
- in file notifier.php, method Notifier::notify()
Handles the update of the index of a data item in a queue with respect to the Notifier object.
- notInTestCase
- in file bloom_filter_file_test.php, method BloomFilterFileTest::notInTestCase()
Tests that if nothing is in the bloom filter yet, that if we do a lookup
- NO_CACHE
- in file arc_tool.php, constant NO_CACHE
NO_CACHE means don't try to use memcache
- NO_DATA_STATE
- in file crawl_constants.php, class constant CrawlConstants::NO_DATA_STATE
- NO_DEBUG_INFO
- in file config.php, constant NO_DEBUG_INFO
Don't display any query info
- NO_FEEDS
- in file statistics_controller.php, constant NO_FEEDS
- NO_FLAGS
- in file web_queue_bundle.php, class constant WebQueueBundle::NO_FLAGS
Url type flag
- NO_LOGGING
- in file news_updater.php, constant NO_LOGGING
We do want logging, but crawl model and other will try to turn off
- NUMBER_OF_BINS
- in file file_cache.php, class constant FileCache::NUMBER_OF_BINS
Total number of bins to cycle between
- NUMBER_OF_LOG_FILES
- in file config.php, constant NUMBER_OF_LOG_FILES
number of log files to rotate amongst
- numDocsOrLinks
- in file index_shard.php, method IndexShard::numDocsOrLinks()
An upper bound on the number of docs or links represented by the start and ending integer offsets into a posting list.
- NUM_CACHE_PAGES
- in file phrase_model.php, class constant PhraseModel::NUM_CACHE_PAGES
Number of pages to cache in one go in memcache or filecache
- NUM_DOCS_PER_GENERATION
- in file config.php, constant NUM_DOCS_PER_GENERATION
number of documents before next gen
- NUM_MULTI_CURL_PAGES
- in file config.php, constant NUM_MULTI_CURL_PAGES
number of multi curl page requests in one go
- NUM_PREFIX_LETTERS
- in file index_dictionary.php, class constant IndexDictionary::NUM_PREFIX_LETTERS
Number of possible prefix records (number of possible values for
- NUM_RECENT_URLS_TO_DISPLAY
- in file config.php, constant NUM_RECENT_URLS_TO_DISPLAY
Number of recently crawled urls to display on admin screen
- NUM_RESULTS_PER_PAGE
- in file config.php, constant NUM_RESULTS_PER_PAGE
default number of search results to display per page
- NUM_TIMES_INTERVAL
- in file statistics_controller.php, class constant StatisticsController::NUM_TIMES_INTERVAL
For size and time distrbutions the number of times the miminal
- NUM_URLS_QUEUE_RAM
- in file config.php, constant NUM_URLS_QUEUE_RAM
maximum number of urls that will be held in ram
- NWordGrams
- in file nword_grams.php, class NWordGrams
Library of functions used to create and extract n word grams
- nocache_view.php
- procedural page nocache_view.php
top
o
- $overall_index
- in file web_archive_bundle_iterator.php, variable WebArchiveBundleIterator::$overall_index
Index between 0 and $this->count of where the iterator is at
- odp_rdf_bundle_iterator.php
- procedural page odp_rdf_bundle_iterator.php
- objectDictionaryHas
- in file pdf_processor.php, method PdfProcessor::objectDictionaryHas()
Checks if the PDF object's object dictionary is in a list of types
- OdpRdfArchiveBundleIterator
- in file odp_rdf_bundle_iterator.php, class OdpRdfArchiveBundleIterator
Used to iterate through the records of a collection of one or more open
- OFFSET
- in file crawl_constants.php, class constant CrawlConstants::OFFSET
- ONE_DAY
- in file source_model.php, class constant SourceModel::ONE_DAY
Number of seconds in a day
- ONE_HOUR
- in file source_model.php, class constant SourceModel::ONE_HOUR
Number of seconds in an hour
- ONE_WEEK
- in file source_model.php, class constant SourceModel::ONE_WEEK
Number of seconds in a week
- open
- in file web_archive.php, method WebArchive::open()
Open the web archive file associated with this WebArchive object.
- openUrlArchive
- in file web_queue_bundle.php, method WebQueueBundle::openUrlArchive()
Opens the url WebArchive associated with this queue bundle in the
- OPERATING_SYSTEM
- in file crawl_constants.php, class constant CrawlConstants::OPERATING_SYSTEM
- OptionsHelper
- in file options_helper.php, class OptionsHelper
This is a helper class is used to handle draw select options form elements
- orderCallback
- in file utility.php, function orderCallback()
Callback function used to sort documents by a field
- outputArchiveList
- in file arc_tool.php, method ArcTool::outputArchiveList()
Lists the Web or IndexArchives in the crawl directory
- outputDictInfo
- in file arc_tool.php, method ArcTool::outputDictInfo()
Prints the IndexDictionary records for a word in an IndexArchiveBundle
- outputInfo
- in file arc_tool.php, method ArcTool::outputInfo()
Determines whether the supplied path is a WebArchiveBundle or an IndexArchiveBundle or non-Yioop Archive. Then outputs to stdout header information about the bundle by calling the appropriate sub-function.
- outputInfoIndexArchiveBundle
- in file arc_tool.php, method ArcTool::outputInfoIndexArchiveBundle()
Outputs to stdout header information for a IndexArchiveBundle bundle.
- outputInfoWebArchiveBundle
- in file arc_tool.php, method ArcTool::outputInfoWebArchiveBundle()
Outputs to stdout header information for a WebArchiveBundle bundle.
- outputPostingInfo
- in file arc_tool.php, method ArcTool::outputPostingInfo()
Prints information about $num many postings beginning at the provided $generation and $offset
- outputPostingLists
- in file index_shard.php, method IndexShard::outputPostingLists()
Used to convert the word_postings string into a word_docs string or if a file handle is provided write out the word_docs sequence of postings to the provided file handle.
- outputQueryData
- in file search_api.php, function outputQueryData()
Short function to pretty-print the data gotten back from a Yioop! query
- outputShowPages
- in file arc_tool.php, method ArcTool::outputShowPages()
Used to list out the pages/summaries stored in a bundle at $archive_path. It lists to stdout $num many documents starting at $start.
- options_helper.php
- procedural page options_helper.php
top
p
- $pages
- in file static_view.php, variable StaticView::$pages
This view is makes use of the localized static page overview.thtml
- $pages
- in file index_bundle_iterator.php, variable IndexBundleIterator::$pages
Cache of what currentDocsWithWord returns
- $pages
- in file view.php, variable View::$pages
Localized static page elements used by this view
- $page_processors
- in file fetcher.php, variable Fetcher::$page_processors
An associative array of (mimetype => name of processor class to handle) pairs.
- $page_range_request
- in file fetcher.php, variable Fetcher::$page_range_request
Maximum number of bytes to download of a webpage
- $page_range_request
- in file queue_server.php, variable QueueServer::$page_range_request
Maximum number of bytes to download of a webpage
- $page_recrawl_frequency
- in file queue_server.php, variable QueueServer::$page_recrawl_frequency
Number of days between resets of the page url filter
- $page_rules
- in file queue_server.php, variable QueueServer::$page_rules
- $page_rule_parser
- in file fetcher.php, variable Fetcher::$page_rule_parser
- $partition
- in file web_archive_bundle.php, variable WebArchiveBundle::$partition
Used to contain the WebArchive paritions of the bundle
- $partition
- in file web_archive_bundle_iterator.php, variable WebArchiveBundleIterator::$partition
The current web archive in the bundle that is being iterated over
- $partitions
- in file text_archive_bundle_iterator.php, variable TextArchiveBundleIterator::$partitions
Array of filenames of arc files in this directory (glob order)
- $partition_index
- in file web_archive_bundle_iterator.php, variable WebArchiveBundleIterator::$partition_index
The item within the current partition to be returned next
- $pdo
- in file pdo_manager.php, variable PdoManager::$pdo
Used to hold the PDO database object
- $pdo_flag
- in file sqlite3_manager.php, variable Sqlite3Manager::$pdo_flag
Sqlite3 whether access to DB is through PDO object or SQLite3 object
- $plugin_processors
- in file fetcher.php, variable Fetcher::$plugin_processors
An associative array of (page processor => array of
- $post_max_size
- in file fetcher.php, variable Fetcher::$post_max_size
Maximum number of bytes which can be uploaded to the current queue server's web app in one go
- $prefixes
- in file index_shard.php, variable IndexShard::$prefixes
An array representing offsets into the words dictionary of the index of the first occurrence of a two byte prefix of a word_id.
- $prefixes_len
- in file index_shard.php, variable IndexShard::$prefixes_len
Length of the prefix index into the dictionary of the shard
- $processors
- in file indexing_plugin.php, variable IndexingPlugin::$processors
Array of the PageProcessor classes used by this IndexingPlugin
- $profile_fields
- in file profile_model.php, variable ProfileModel::$profile_fields
- page_rule_parser.php
- procedural page page_rule_parser.php
- persistent_structure.php
- procedural page persistent_structure.php
- phrase_parser.php
- procedural page phrase_parser.php
- priority_queue.php
- procedural page priority_queue.php
- page_processor.php
- procedural page page_processor.php
- pdf_processor.php
- procedural page pdf_processor.php
- png_processor.php
- procedural page png_processor.php
- pptx_processor.php
- procedural page pptx_processor.php
- ppt_processor.php
- procedural page ppt_processor.php
- pdo_manager.php
- procedural page pdo_manager.php
- parallel_model.php
- procedural page parallel_model.php
- phrase_model.php
- procedural page phrase_model.php
- profile_model.php
- procedural page profile_model.php
- packDoclenNum
- in file index_shard.php, method IndexShard::packDoclenNum()
Used to store the length of a document as well as the number of key components in its doc_id as a packed int (4 byte string)
- packFloat
- in file utility.php, function packFloat()
Packs an float into a 4 char string
- packInt
- in file utility.php, function packInt()
Packs an int into a 4 char string
- packLeft
- in file bzip2_block_iterator.php, method BZip2BlockIterator::packLeft()
Computes a new bzip2 block portions and bits left over after adding $bytes to the passed $block.
- packListModified9
- in file utility.php, function packListModified9()
Packs the contents of a single word of a sequence being encoded using Modified9.
- packPosting
- in file utility.php, function packPosting()
Makes an packed integer string from a docindex and the number of occurrences of a word in the document with that docindex.
- packWords
- in file index_shard.php, method IndexShard::packWords()
Posting lists are initially stored associated with a word as a key value pair. The merge operation then merges them these to a string help by word_postings. packWords separates words from postings.
- PAGE
- in file crawl_constants.php, class constant CrawlConstants::PAGE
- pageOptions
- in file admin_controller.php, method AdminController::pageOptions()
Handles admin request related to controlling file options to be used in a crawl
- PageOptionsElement
- in file pageoptions_element.php, class PageOptionsElement
This element is used to render the Page Options admin activity This activity lets a usercontrol the amount of web pages downloaded, the recrawl frequency, the file types, etc of the pages crawled
- pageProcessing
- in file recipe_plugin.php, method RecipePlugin::pageProcessing()
This method is called by a PageProcessor in its handle() method
- pageProcessing
- in file indexing_plugin.php, method IndexingPlugin::pageProcessing()
This method is called by a PageProcessor in its handle() method
- PageProcessor
- in file page_processor.php, class PageProcessor
Base class common to all processors of web page data
- PageRuleParser
- in file page_rule_parser.php, class PageRuleParser
Has methods to parse user-defined page rules to apply documents to be indexed.
- PAGE_COUNT_WIKIPEDIA
- in file nword_grams.php, class constant NWordGrams::PAGE_COUNT_WIKIPEDIA
- PAGE_COUNT_WIKTIONARY
- in file nword_grams.php, class constant NWordGrams::PAGE_COUNT_WIKTIONARY
- PAGE_IMPORTANCE
- in file crawl_constants.php, class constant CrawlConstants::PAGE_IMPORTANCE
- PAGE_RANGE_REQUEST
- in file crawl_constants.php, class constant CrawlConstants::PAGE_RANGE_REQUEST
- PAGE_RANGE_REQUEST
- in file config.php, constant PAGE_RANGE_REQUEST
request this many bytes out of a page -- this is the default value to
- PAGE_RECRAWL_FREQUENCY
- in file crawl_constants.php, class constant CrawlConstants::PAGE_RECRAWL_FREQUENCY
- PAGE_RECRAWL_FREQUENCY
- in file config.php, constant PAGE_RECRAWL_FREQUENCY
- PAGE_RULES
- in file crawl_constants.php, class constant CrawlConstants::PAGE_RULES
- PAGE_TIMEOUT
- in file config.php, constant PAGE_TIMEOUT
time in seconds before we give up on multi page requests
- PaginationHelper
- in file pagination_helper.php, class PaginationHelper
This is a helper class is used to handle pagination of search results
- ParallelModel
- in file parallel_model.php, class ParallelModel
Base class of models that need access to data from multiple queue servers Subclasses include @see CrawlModel and @see PhraseModel.
- parseBrackets
- in file pdf_processor.php, method PdfProcessor::parseBrackets()
Extracts ASCII text till the next close brackets
- parseHeaderPage
- in file fetch_url.php, method FetchUrl::parseHeaderPage()
Splits an http response document into the http headers sent and the web page returned. Parses out useful information from the header and return an array of these two parts and the useful info.
- parseIfConditions
- in file phrase_model.php, method PhraseModel::parseIfConditions()
Evaluates any if: conditional meta-words in the query string to calculate a new query string.
- parseParentheses
- in file pdf_processor.php, method PdfProcessor::parseParentheses()
Extracts ASCII text till the next close parenthesis
- parseRules
- in file page_rule_parser.php, method PageRuleParser::parseRules()
Parses a string of pages rules into parse trees hican be excuted later
- parseText
- in file pdf_processor.php, method PdfProcessor::parseText()
Extracts ASCII text from PDF data, getting rid of non printable data, square brackets and parenthesis and converting char codes to their values.
- parseWordStructConjunctiveQuery
- in file phrase_model.php, method PhraseModel::parseWordStructConjunctiveQuery()
Parses from a string phrase representing a conjunctive query, a struct consisting of the words keys searched for, the allowed and disallowed phrases, the weight that should be put on these query results, and which archive to use.
- partitionByHash
- in file utility.php, function partitionByHash()
Used by a controller to take a table and return those rows in the table that a given queue_server would be responsible for handling
- PATHS
- in file crawl_constants.php, class constant CrawlConstants::PATHS
- PdfProcessor
- in file pdf_processor.php, class PdfProcessor
Used to create crawl summary information for PDF files
- PdoManager
- in file pdo_manager.php, class PdoManager
Pdo DatasourceManager
- peek
- in file priority_queue.php, method PriorityQueue::peek()
Gets the data stored at the ith location in the priority queue
- peekQueue
- in file web_queue_bundle.php, method WebQueueBundle::peekQueue()
Gets the url and weight of the ith entry in the priority queue
- percolateDown
- in file priority_queue.php, method PriorityQueue::percolateDown()
If the ith element in the PriorityQueue violates the heap property with some child node (children should be of lower priority than the parent), this function tries modify the heap to restore the heap property.
- percolateUp
- in file priority_queue.php, method PriorityQueue::percolateUp()
If the $ith element in the PriorityQueue violates the heap property with its parent node (children should be of lower priority than the parent), this function tries modify the heap to restore the heap property.
- PersistentStructure
- in file persistent_structure.php, class PersistentStructure
A PersistentStructure is a data structure which every so many operations will be saved to secondary storage (such as disk).
- PhraseModel
- in file phrase_model.php, class PhraseModel
This is class is used to handle results for a given phrase search
- PhraseParser
- in file phrase_parser.php, class PhraseParser
Library of functions used to manipulate words and phrases
- PhraseParserTest
- in file phrase_parser_test.php, class PhraseParserTest
Used to test that the PhraseParser class. Want to make sure bigram extracting works correctly
- PngProcessor
- in file png_processor.php, class PngProcessor
Used to create crawl summary information for PNG files
- poll
- in file priority_queue.php, method PriorityQueue::poll()
Removes and returns the ith element out of the Priority queue.
- POSITION_LIST
- in file crawl_constants.php, class constant CrawlConstants::POSITION_LIST
- POSTING_LEN
- in file index_shard.php, class constant IndexShard::POSTING_LEN
Length of one posting ( a doc offset occurrence pair) in a posting list
- postlude
- in file tokenizer.php, method ItStemmer::postlude()
Converts U and/or I back to lowercase
- postProcessing
- in file indexing_plugin.php, method IndexingPlugin::postProcessing()
This method is called by the queue_server with the name of
- postProcessing
- in file recipe_plugin.php, method RecipePlugin::postProcessing()
Implements post processing of recipes. recipes are extracted ingredients are scrubbed and recipes are clustered. The clustered recipes are added back to the index.
- POST_MAX_SIZE
- in file crawl_constants.php, class constant CrawlConstants::POST_MAX_SIZE
- POST_PROCESSING
- in file indexing_plugin.php, constant POST_PROCESSING
Flag to say that post_processing is occurring (used to control logging in
- PptProcessor
- in file ppt_processor.php, class PptProcessor
Used to create crawl summary information for PPT files
- PptxProcessor
- in file pptx_processor.php, class PptxProcessor
Used to create crawl summary information for PPTX files
- PptxProcessorTest
- in file pptx_processor_test.php, class PptxProcessorTest
UnitTest for the PptxProcessor class. It is used to process pptx files which are xml based zip format
- PPT_IGNORING
- in file ppt_processor.php, class constant PptProcessor::PPT_IGNORING
- PRECISION
- in file config.php, constant PRECISION
precision to round floating points document scores
- PREFIX_HEADER_SIZE
- in file index_dictionary.php, class constant IndexDictionary::PREFIX_HEADER_SIZE
One dictionary file represents the words whose ids begin with a fixed char. Amongst these id, the prefix index gives offsets for where id's with a given second char start. The total length of the records needed is PREFIX_ITEM_SIZE * NUM_PREFIX_LETTERS.
- PREFIX_ITEM_SIZE
- in file index_dictionary.php, class constant IndexDictionary::PREFIX_ITEM_SIZE
Size of an item in the prefix index used to look up words.
- prelude
- in file tokenizer.php, method ItStemmer::prelude()
Performs the following functions:
- prepareGlobals
- in file configure_tool.php, method ConfigureTool::prepareGlobals()
Sets-up the field values of the super globals used by AdminController when changing a profile or managing passwords. These particular values don't change with respect to what this tool does.
- prepareUrlHeaders
- in file fetch_url.php, method FetchUrl::prepareUrlHeaders()
- prepareWordsAndPrefixes
- in file index_shard.php, method IndexShard::prepareWordsAndPrefixes()
Computes the prefix string index for the current words array.
- PREP_DIR
- in file config.php, constant PREP_DIR
- printContents
- in file web_queue_bundle.php, method WebQueueBundle::printContents()
Pretty prints the contents of the queue bundle in order
- printContents
- in file priority_queue.php, method PriorityQueue::printContents()
Pretty prints the contents of the queue viewed as an array.
- PriorityQueue
- in file priority_queue.php, class PriorityQueue
Code used to manage a memory efficient priority queue.
- PriorityQueueTest
- in file priority_queue_test.php, class PriorityQueueTest
Used to test the PriorityQueue class that is used to figure out which URL to crawl next
- privacy.php
- procedural page privacy.php
- process
- in file rtf_processor.php, method RtfProcessor::process()
Computes a summary based on a rtf string of a document
- process
- in file epub_processor.php, method EpubProcessor::process()
Used to extract the title, description and links from a string consisting of ebook publication data.
- process
- in file pdf_processor.php, method PdfProcessor::process()
Used to extract the title, description and links from a string consisting of PDF data.
- process
- in file rss_processor.php, method RssProcessor::process()
Used to extract the title, description and links from a string consisting of rss or atom news feed data.
- process
- in file xlsx_processor.php, method XlsxProcessor::process()
Used to extract the title, description and links from a xlsx file.
- process
- in file xml_processor.php, method XmlProcessor::process()
Used to extract the title, description and links from a string consisting of rss news feed data.
- process
- in file robot_processor.php, method RobotProcessor::process()
Parses the contents of a robots.txt page extracting allowed, disallowed paths, crawl-delay, and sitemaps. We also extract a list of all user agent strings seen.
- process
- in file text_processor.php, method TextProcessor::process()
Computes a summary based on a text string of a document
- process
- in file doc_processor.php, method DocProcessor::process()
Used to extract the title, description and links from a string consisting of Word Doc data (2004 or earlier).
- process
- in file image_processor.php, method ImageProcessor::process()
Extract summary data from the image provided in $page together the url in $url where it was downloaded from
- process
- in file ppt_processor.php, method PptProcessor::process()
Computes a summary based on a string of a binary Powerpoint document (as opposed to the modern xml powerpoint format).
- process
- in file jpg_processor.php, method JpgProcessor::process()
- process
- in file gif_processor.php, method GifProcessor::process()
- process
- in file pptx_processor.php, method PptxProcessor::process()
Used to extract the title, description and links from a pptx file consisting of xml data.
- process
- in file page_processor.php, method PageProcessor::process()
Should be implemented to compute a summary based on a
- process
- in file html_processor.php, method HtmlProcessor::process()
Used to extract the title, description and links from a string consisting of webpage data.
- process
- in file bmp_processor.php, method BmpProcessor::process()
- process
- in file svg_processor.php, method SvgProcessor::process()
Used to extract the title, description and links from a string consisting of svg image. If the image is small enough, an attempt is made to generate a thumbnail
- process
- in file sitemap_processor.php, method SitemapProcessor::process()
Used to extract the title, description and links from a string consisting of rss news feed data.
- process
- in file png_processor.php, method PngProcessor::process()
- processCrawlData
- in file queue_server.php, method QueueServer::processCrawlData()
Main body of queue_server loop where indexing, scheduling, robot file processing is done.
- processDataArchive
- in file queue_server.php, method QueueServer::processDataArchive()
Process a file of to-crawl urls adding to or adjusting the weight in the PriorityQueue of those which have not been seen. Also updates the queue with seen url info
- processDataFile
- in file queue_server.php, method QueueServer::processDataFile()
Generic function used to process Data, Index, and Robot info schedules Finds the first file in the the direcotry of schedules of the given type, and calls the appropriate callback method for that type.
- processExternalPage
- in file odp_rdf_bundle_iterator.php, method OdpRdfArchiveBundleIterator::processExternalPage()
Computes an HTML page for an ExternalPage tag parsed from the ODP RDF document
- processFetchPages
- in file fetcher.php, method Fetcher::processFetchPages()
Processes an array of downloaded web pages with the appropriate page processor.
- processHandler
- in file crawl_daemon.php, method CrawlDaemon::processHandler()
Tick callback function used to update the timestamp in this processes
- processIndexArchive
- in file queue_server.php, method QueueServer::processIndexArchive()
Adds the summary and index data in $file to summary bundle and word index
- processIndexData
- in file queue_server.php, method QueueServer::processIndexData()
Sets up the directory to look for a file of unprocessed
- processQuery
- in file search_controller.php, method SearchController::processQuery()
Searches the database for the most relevant pages for the supplied search terms. Renders the results to the HTML page.
- processQueueUrls
- in file queue_server.php, method QueueServer::processQueueUrls()
Checks for a new crawl file or a schedule data for the current crawl and if such a exists then processes its contents adding the relevant urls to the priority queue
- processRecrawlDataArchive
- in file queue_server.php, method QueueServer::processRecrawlDataArchive()
Processes fetcher data file information during a recrawl
- processRecrawlRobotArchive
- in file queue_server.php, method QueueServer::processRecrawlRobotArchive()
Even during a recrawl the fetcher may send robot data to the queue_server. This function delete the passed robot file.
- processRecrawlRobotUrls
- in file queue_server.php, method QueueServer::processRecrawlRobotUrls()
Even during a recrawl the fetcher may send robot data to the queue_server. This function prints a log message and calls another function to delete this useless robot file.
- processRequest
- in file archive_controller.php, method ArchiveController::processRequest()
Main method for this controller to handle requests. It first checks the request is valid, and then handles the corresponding activity
- processRequest
- in file admin_controller.php, method AdminController::processRequest()
This is the main entry point for handling requests to administer the Yioop/SeekQuarry site
- processRequest
- in file static_controller.php, method StaticController::processRequest()
This is the main entry point for handling people arriving to the SeekQuarry site.
- processRequest
- in file statistics_controller.php, method StatisticsController::processRequest()
Main handler for requests coming into this controller for web crawl
- processRequest
- in file controller.php, method Controller::processRequest()
This function should be overriden to web handle requests
- processRequest
- in file crawl_controller.php, method CrawlController::processRequest()
Checks that the request seems to be coming from a legitimate fetcher then determines which activity the fetcher is requesting and calls that activity for processing.
- processRequest
- in file machine_controller.php, method MachineController::processRequest()
Checks that the request seems to be coming from a legitimate fetcher then determines which activity the fetcher is requesting and calls that activity for processing.
- processRequest
- in file settings_controller.php, method SettingsController::processRequest()
Sets up the available perpage language options.
- processRequest
- in file fetch_controller.php, method FetchController::processRequest()
Checks that the request seems to be coming from a legitimate fetcher then determines which activity the fetcher is requesting and calls that activity for processing.
- processRequest
- in file search_controller.php, method SearchController::processRequest()
This is the main entry point for handling a search request.
- processRequest
- in file resource_controller.php, method ResourceController::processRequest()
Checks that the request seems to be coming from a legitimate fetcher or mirror server then determines which activity is being requested and calls the method for that activity.
- processRobotArchive
- in file queue_server.php, method QueueServer::processRobotArchive()
Reads in $file of robot data adding host-paths to the disallowed
- processRobotUrls
- in file queue_server.php, method QueueServer::processRobotUrls()
Checks how old the oldest robot data is and dumps if older then a
- processSession
- in file admin_controller.php, method AdminController::processSession()
Determines the user's current allowed activities and current activity, then calls the method for the latter.
- processSubdocs
- in file fetcher.php, method Fetcher::processSubdocs()
The pageProcessing method of an IndexingPlugin generates
- processTopic
- in file odp_rdf_bundle_iterator.php, method OdpRdfArchiveBundleIterator::processTopic()
Computes an HTML page for a Topic tag parsed from the ODP RDF document
- produceFetchBatch
- in file queue_server.php, method QueueServer::produceFetchBatch()
Produces a schedule.txt file of url data for a fetcher to crawl next.
- PROFILE
- in file config.php, constant PROFILE
- ProfileModel
- in file profile_model.php, class ProfileModel
This is class is used to handle getting and saving the profile.php of the current search engine instance
- PROFILE_FILE_NAME
- in file config.php, constant PROFILE_FILE_NAME
setting profile.php to something else in loac_config.php allows one to have
- PROXIMITY
- in file crawl_constants.php, class constant CrawlConstants::PROXIMITY
- pruneLinks
- in file fetcher.php, method Fetcher::pruneLinks()
Page processors are allowed to extract up to MAX_LINKS_TO_EXTRACT This method attempts to cull from the doc_info struct the best MAX_LINKS_PER_PAGE. Currently, this is done by first removing links which of filetype or sites the crawler is forbidden from crawl.
- PUNCT
- in file config.php, constant PUNCT
Characters we view as not part of words, not same as POSIX [:punct:]
- put
- in file string_array.php, method StringArray::put()
Puts data into the ith item of the StringArray
- putGetTestCase
- in file string_array_test.php, method StringArrayTest::putGetTestCase()
Check if can put objects into string array and retrieve them
- putRow
- in file priority_queue.php, method PriorityQueue::putRow()
Add data to the $i row of the priority queue viewed as an array Calls the notifier associated with this queue about the change in data's location
- putSaveGetSavedTestCase
- in file string_array_test.php, method StringArrayTest::putSaveGetSavedTestCase()
Check if saving and loading of StringArray's works
- phrase_parser_test.php
- procedural page phrase_parser_test.php
- pptx_processor_test.php
- procedural page pptx_processor_test.php
- priority_queue_test.php
- procedural page priority_queue_test.php
- pageoptions_element.php
- procedural page pageoptions_element.php
- pagination_helper.php
- procedural page pagination_helper.php
top
q
- $query_info
- in file phrase_model.php, variable PhraseModel::$query_info
Used to hold query statistics about the current query
- $query_log
- in file datasource_manager.php, variable DatasourceManager::$query_log
Used to store statistics about what queries have been run depending on
- $queue_servers
- in file network_iterator.php, variable NetworkIterator::$queue_servers
An array of servers to ask a query to
- $queue_servers
- in file fetcher.php, variable Fetcher::$queue_servers
Array of Urls or IP addresses of the queue_servers to get sites to crawl
- $quota_clear_time
- in file queue_server.php, variable QueueServer::$quota_clear_time
Timestamp of lst time download from site quotas were cleared
- $quota_sites
- in file queue_server.php, variable QueueServer::$quota_sites
Web-sites that have an hourly crawl quota
- $quota_sites_keys
- in file queue_server.php, variable QueueServer::$quota_sites_keys
Cache of array_keys of $quota_sites
- $quote_positions
- in file intersect_iterator.php, variable IntersectIterator::$quote_positions
This iterator returns only documents containing quoted terms in
- query_tool.php
- procedural page query_tool.php
- queue_server.php
- procedural page queue_server.php
- queryRequest
- in file search_controller.php, method SearchController::queryRequest()
Part of Yioop! Search API. Performs a normal search query and returns associative array of query results
- QueryTool
- in file query_tool.php, class QueryTool
Tool to provide a command line query interface to indexes stored in Yioop! database. Running with no arguments gives a help message for this tool.
- QUERY_INFO
- in file config.php, constant QUERY_INFO
bit of DEBUG_LEVEL used to indicate query statistics should be displayed
- QUERY_STATISTICS
- in file config.php, constant QUERY_STATISTICS
if true query statistics are diplayed
- Queue
- in file recipe_plugin.php, class Queue
queue for the BFS traversal
- QueueServer
- in file queue_server.php, class QueueServer
Command line program responsible for managing Yioop crawls.
- QueueServerTest
- in file queue_server_test.php, class QueueServerTest
Used to test functions related to scheduling websites to crawl for a web crawl (the responsibility of a QueueServer)
- queue_base_name
- in file crawl_constants.php, class constant CrawlConstants::queue_base_name
- QUEUE_SERVERS
- in file crawl_constants.php, class constant CrawlConstants::QUEUE_SERVERS
- QUEUE_SLEEP_TIME
- in file config.php, constant QUEUE_SLEEP_TIME
an a queue_server minimum loop idle time
- queue_server_test.php
- procedural page queue_server_test.php
top
r
- $r1_start
- in file tokenizer.php, variable ItStemmer::$r1_start
Storage used in computing the starting index of region R1
- $r1_string
- in file tokenizer.php, variable ItStemmer::$r1_string
Storage used in computing region R1
- $r2_start
- in file tokenizer.php, variable ItStemmer::$r2_start
Storage used in computing the starting index of region R2
- $r2_string
- in file tokenizer.php, variable ItStemmer::$r2_string
Storage used in computing region R2
- $read_only_archive
- in file web_archive_bundle.php, variable WebArchiveBundle::$read_only_archive
Controls whether the archive was opened in read only mode
- $read_only_from_disk
- in file index_shard.php, variable IndexShard::$read_only_from_disk
Flag used to determined if this shard is going to be largely kept on disk and to be in read only mode. Otherwise, shard will assume to be completely held in memory and be read/writable.
- $read_tier
- in file index_dictionary.php, variable IndexDictionary::$read_tier
Tier currently being used to read dictionary data from
- $recrawl_check_scheduler
- in file fetcher.php, variable Fetcher::$recrawl_check_scheduler
Keeps track of whether during the recrawl we should notify a
- $restrict_sites_by_url
- in file queue_server.php, variable QueueServer::$restrict_sites_by_url
Says whether the $allowed_sites array is being used or not
- $restrict_sites_by_url
- in file fetcher.php, variable Fetcher::$restrict_sites_by_url
Says whether the $allowed_sites array is being used or not
- $result
- in file sqlite_manager.php, variable SqliteManager::$result
Stores the result resource of the last DB exec
- $results_per_block
- in file index_bundle_iterator.php, variable IndexBundleIterator::$results_per_block
Number of documents returned for each block (at most)
- $result_dir
- in file archive_bundle_iterator.php, variable ArchiveBundleIterator::$result_dir
The path to the directory where the iteration status is stored.
- $result_timestamp
- in file archive_bundle_iterator.php, variable ArchiveBundleIterator::$result_timestamp
Timestamp of the archive that is being used to store results in
- $robot_archive
- in file web_queue_bundle.php, variable WebQueueBundle::$robot_archive
WebArchive used to store paths coming from robots.txt files
- $robot_table
- in file web_queue_bundle.php, variable WebQueueBundle::$robot_table
HashTable used to store offsets into WebArchive that stores robot paths
- $rule_trees
- in file page_rule_parser.php, variable PageRuleParser::$rule_trees
Used to store parse trees that this parser executes
- $rv_start
- in file tokenizer.php, variable ItStemmer::$rv_start
Storage used in computing the starting index of region RV
- $rv_string
- in file tokenizer.php, variable ItStemmer::$rv_string
Storage used in computing Region RV
- resource_controller.php
- procedural page resource_controller.php
- recipe_plugin.php
- procedural page recipe_plugin.php
- robot_processor.php
- procedural page robot_processor.php
- rss_processor.php
- procedural page rss_processor.php
- rtf_processor.php
- procedural page rtf_processor.php
- role_model.php
- procedural page role_model.php
- r1
- in file tokenizer.php, method ItStemmer::r1()
Computes the starting index for region R1
- r2
- in file tokenizer.php, method ItStemmer::r2()
Computes the starting index for region R2
- readBlockDictAtOffset
- in file index_dictionary.php, method IndexDictionary::readBlockDictAtOffset()
Reads DICT_BLOCK_SIZE bytes from the prefix file $file_num beginning at byte offset $bytes
- readBlockShardAtOffset
- in file index_shard.php, method IndexShard::readBlockShardAtOffset()
Reads SHARD_BLOCK_SIZE from the current IndexShard's file beginning at byte offset $bytes
- readInfoBlock
- in file web_archive.php, method WebArchive::readInfoBlock()
Read the info block associated with this web archive.
- readInput
- in file utility.php, function readInput()
Used to read a line of input from the command-line
- readMediaWikiHeader
- in file mediawiki_bundle_iterator.php, method MediaWikiArchiveBundleIterator::readMediaWikiHeader()
Reads the siteinfo tag of the mediawiki xml file and extract data that will be used in constructing page summaries.
- readMessage
- in file utility.php, function readMessage()
Used to read a several lines from the terminal up until
- readPassword
- in file utility.php, function readPassword()
Used to read a line of input from the command-line
- READ_LEN_TEXT_SEG
- in file ppt_processor.php, class constant PptProcessor::READ_LEN_TEXT_SEG
- rebuildHashTable
- in file web_queue_bundle.php, method WebQueueBundle::rebuildHashTable()
Makes a new HashTable without deleted rows
- rebuildUrlTable
- in file web_queue_bundle.php, method WebQueueBundle::rebuildUrlTable()
Since offsets are integers, even if the queue is kept relatively small, periodically we will need to rebuild the archive for storing urls.
- RECENT_URLS
- in file crawl_constants.php, class constant CrawlConstants::RECENT_URLS
- RecipePlugin
- in file recipe_plugin.php, class RecipePlugin
This class handles recipe processing.
- recordCmp
- in file index_dictionary.php, method IndexDictionary::recordCmp()
Does a lexicographical comparison of the word_ids of two word records.
- REDO_STATE
- in file crawl_constants.php, class constant CrawlConstants::REDO_STATE
- reindexIndexArchive
- in file arc_tool.php, method ArcTool::reindexIndexArchive()
Used to recompute the dictionary of an index archive -- either from scratch using the index shard data or just using the current dictionary but merging the tiers into one tier
- reinsertCollisionAndIndexTestCase
- in file hash_table_test.php, method HashTableTest::reinsertCollisionAndIndexTestCase()
First check that inserting an item twice does not change its index in
- relatedRequest
- in file search_controller.php, method SearchController::relatedRequest()
Part of Yioop! Search API. Performs a related to a given url search query and returns associative array of query results
- RELEVANCE
- in file crawl_constants.php, class constant CrawlConstants::RELEVANCE
- reloadArchiveTestCase
- in file web_archive_test.php, method WebArchiveTest::reloadArchiveTestCase()
If the file associated with a web archive already exists when the
- removeQueue
- in file web_queue_bundle.php, method WebQueueBundle::removeQueue()
Removes a url from the priority queue.
- render
- in file managelocales_element.php, method ManagelocalesElement::render()
Responsible for drawing the ceate, delete set writing mode screen for locales as well ass the screen for adding modifying translations
- render
- in file managecrawls_element.php, method ManagecrawlsElement::render()
Draw form to start a new crawl, has div place holder and ajax code to get info about current crawl
- render
- in file manageaccount_element.php, method ManageaccountElement::render()
Draws a change password form.
- render
- in file manageusers_element.php, method ManageusersElement::render()
draws a screen in which an admin can add users, delete users, and manipulate user roles.
- render
- in file toggle_helper.php, method ToggleHelper::render()
Draws an On Off switch in HTML where to toggle state one clicks a link
- render
- in file displayresults_helper.php, method DisplayresultsHelper::render()
- render
- in file manageroles_element.php, method ManagerolesElement::render()
renders the screen in which roles can be created, deleted, and added or deleted from a user
- render
- in file managemachines_element.php, method ManagemachinesElement::render()
Draws the ManageMachines element to the output buffer
- render
- in file footer_element.php, method FooterElement::render()
Element used to render the login screen for the admin control panel
- render
- in file feeds_helper.php, method FeedsHelper::render()
Takes page summaries for RSS pages and the current query and draws list of news links and a link to the news link subsearch page if applicable.
- render
- in file language_element.php, method LanguageElement::render()
Draws a selects tag with a list of available languages
- render
- in file signin_element.php, method SigninElement::render()
Method responsible for drawing links to settings and login panels
- render
- in file images_helper.php, method ImagesHelper::render()
Takes page summaries for image pages and the current query and draw a thumbnail strip so that clicking on an image goes to the cache of that image.
- render
- in file searchsources_element.php, method SearchsourcesElement::render()
Renders search source and subsearch forms
- render
- in file resultseditor_element.php, method ResultsEditorElement::render()
Draws the Screen for the Search Filter activity. This activity is used to filter urls out of the search results
- render
- in file layout.php, method Layout::render()
The render method of Layout and its subclasses is responsible for drawing the header of the document, calling the renderView method of the View that lives on the layout and then drawing the footer of the document.
- render
- in file element.php, method Element::render()
This method is responsible for actually drawing the view.
- render
- in file editmix_element.php, method EditmixElement::render()
Draw form to start a new crawl, has div place holder and ajax code to get info about current crawl
- render
- in file editlocales_element.php, method EditlocalesElement::render()
Draws a form with strings to translate and a text field for the translation into the given locale. Strings with no translations yet appear in red
- render
- in file filetype_helper.php, method FiletypeHelper::render()
Outputs the filetype corresponding to the supplied mime type.
- render
- in file rss_layout.php, method RssLayout::render()
Responsible for drawing the header of the document containing Yioop! title and including basic.js. It calls the renderView method of the View that lives on the layout. If the QUERY_STATISTIC config setting is set, it output statistics about each query run on the database.
- render
- in file editstatic_element.php, method EditstaticElement::render()
Draws the forms used to edit static pages.
- render
- in file machinelog_element.php, method MachinelogElement::render()
Draws the log file of a queue_server or a fetcher
- render
- in file subsearch_element.php, method SubsearchElement::render()
Method responsible for drawing links to common subsearches
- render
- in file configure_element.php, method ConfigureElement::render()
Draws the forms used to configure the search engine.
- render
- in file options_helper.php, method OptionsHelper::render()
Draws an HTML select tag according to the supplied parameters
- render
- in file mixcrawls_element.php, method MixcrawlsElement::render()
Draw form to start a new crawl, has div place holder and ajax code to get info about current crawl
- render
- in file activity_element.php, method ActivityElement::render()
Displays a list of admin activities
- render
- in file videourl_helper.php, method VideourlHelper::render()
Used to check if a url is the url of a video site and if so draw a link with a thumbnail from the video.
- render
- in file pagination_helper.php, method PaginationHelper::render()
Draws a strip of links which begins with a previous link (if their are previous pages of links) followed by up to ten links to more search result page (if available) followed by a next set of pages link.
- render
- in file web_layout.php, method WebLayout::render()
Responsible for drawing the header of the document containing Yioop! title and including basic.js. It calls the renderView method of the View that lives on the layout. If the QUERY_STATISTIC config setting is set, it output statistics about each query run on the database.
- render
- in file view.php, method View::render()
This method is responsible for drawing both the layout and the view. It should not be modified to change the display of then view. Instead, implement renderView.
- render
- in file crawloptions_element.php, method CrawloptionsElement::render()
Draws configurable options about how a web crawl should be conducted
- render
- in file pageoptions_element.php, method PageOptionsElement::render()
Draws the page options element to the output buffer
- renderView
- in file settings_view.php, method SettingsView::renderView()
sDraws the web page on which users can control their search settings.
- renderView
- in file search_view.php, method SearchView::renderView()
Draws the main landing pages as well as search result pages
- renderView
- in file fetch_view.php, method FetchView::renderView()
Draws message to be used by a fetcher. It might for example contains a schedule of sites to crawl
- renderView
- in file static_view.php, method StaticView::renderView()
Draws the login web page.
- renderView
- in file rss_view.php, method RssView::renderView()
Draws the main landing pages as well as search result pages
- renderView
- in file view.php, method View::renderView()
This abstract method is implemented in sub classes with code which actually draws the view. The current layouts render method calls this function.
- renderView
- in file signin_view.php, method SigninView::renderView()
Draws the login web page.
- renderView
- in file nocache_view.php, method NocacheView::renderView()
Draws a simple message saying no cache available of the requested page
- renderView
- in file crawlstatus_view.php, method CrawlstatusView::renderView()
An Ajax call from the Manage Crawl Element in Admin View triggers this view to be instantiated. The renderView method then draws statistics about the currently active crawl.The $data is supplied by the crawlStatus method of the AdminController.
- renderView
- in file machinestatus_view.php, method MachinestatusView::renderView()
Draws the ManagestatusView to the output buffer
- renderView
- in file statistics_view.php, method StatisticsView::renderView()
Draws the web page used to display statistics about the default crawl
- renderView
- in file admin_view.php, method AdminView::renderView()
Renders the list of admin activities and draws the current activity Renders the Javascript to autologout after an hour
- replace
- in file code_tool.php, function replace()
Performs a search and replace for given pattern in files in supplied sub-folder/file
- replaceFile
- in file code_tool.php, function replaceFile()
Callback function applied to each file in the directory being traversed
- reschedulePages
- in file fetcher.php, method Fetcher::reschedulePages()
Sorts out pages for which no content was downloaded so that they can be scheduled to be crawled again.
- reset
- in file text_archive_bundle_iterator.php, method TextArchiveBundleIterator::reset()
Resets the iterator to the start of the archive bundle
- reset
- in file web_archive.php, method WebArchive::reset()
Resets the iterator for this web archive to the first object
- reset
- in file bloom_filter_bundle.php, method BloomFilterBundle::reset()
Empties the contents of the bloom filter bundle and resets it to start storing new data.
- reset
- in file intersect_iterator.php, method IntersectIterator::reset()
Returns the iterators to the first document block that it could iterate
- reset
- in file web_archive_bundle_iterator.php, method WebArchiveBundleIterator::reset()
Resets the iterator to the start of the archive bundle
- reset
- in file doc_iterator.php, method DocIterator::reset()
Returns the iterators to the first document block that it could iterate
- reset
- in file database_bundle_iterator.php, method DatabaseBundleIterator::reset()
Resets the iterator to the start of the archive bundle
- reset
- in file group_iterator.php, method GroupIterator::reset()
Returns the iterators to the first document block that it could iterate
- reset
- in file index_bundle_iterator.php, method IndexBundleIterator::reset()
Returns the iterators to the first document block that it could iterate
- reset
- in file archive_bundle_iterator.php, method ArchiveBundleIterator::reset()
Resets the iterator to the start of the archive bundle
- reset
- in file negation_iterator.php, method NegationIterator::reset()
Returns the iterators to the first document block that it could iterate
- reset
- in file word_iterator.php, method WordIterator::reset()
Returns the iterators to the first document block that it could iterate
- reset
- in file mix_archive_bundle_iterator.php, method MixArchiveBundleIterator::reset()
Resets the iterator to the start of the archive bundle
- reset
- in file union_iterator.php, method UnionIterator::reset()
Returns the iterators to the first document block that it could iterate
- reset
- in file network_iterator.php, method NetworkIterator::reset()
Returns the iterators to the first document block that it could iterate
- ResourceController
- in file resource_controller.php, class ResourceController
Used to serve resources, css, or scripts such as images from APP_DIR
- restartCrashedFetchers
- in file machine_model.php, method MachineModel::restartCrashedFetchers()
Used to restart any fetchers which the user turned on, but which
- restoreCheckpoint
- in file archive_bundle_iterator.php, method ArchiveBundleIterator::restoreCheckpoint()
Restores the internal state from the file iterate_status.txt in the
- restoreCheckPoint
- in file text_archive_bundle_iterator.php, method TextArchiveBundleIterator::restoreCheckPoint()
Restores the internal state from the file iterate_status.txt in the
- restoreCheckpoint
- in file mix_archive_bundle_iterator.php, method MixArchiveBundleIterator::restoreCheckpoint()
Restores state from a previous instantiation, after the last batch of pages extracted.
- restoreCheckPoint
- in file database_bundle_iterator.php, method DatabaseBundleIterator::restoreCheckPoint()
Restores the internal state from the file iterate_status.txt in the result dir such that the next call to nextPages will pick up from just after the last checkpoint.
- restoreCheckpoint
- in file web_archive_bundle_iterator.php, method WebArchiveBundleIterator::restoreCheckpoint()
Restores state from a previous instantiation, after the last batch of pages extracted.
- restoreCheckPoint
- in file mediawiki_bundle_iterator.php, method MediaWikiArchiveBundleIterator::restoreCheckPoint()
Restores the internal state from the file iterate_status.txt in the result dir such that the next call to nextPages will pick up from just after the last checkpoint. We also reset up our regex substitutions
- restrictQueryByUserAgent
- in file search_controller.php, method SearchController::restrictQueryByUserAgent()
Sometimes robots disobey the statistics page nofollow meta tag.
- RESTRICT_SITES_BY_URL
- in file crawl_constants.php, class constant CrawlConstants::RESTRICT_SITES_BY_URL
- resultsEditor
- in file admin_controller.php, method AdminController::resultsEditor()
Handles admin request related to the search filter activity
- ResultsEditorElement
- in file resultseditor_element.php, class ResultsEditorElement
Element used to control how urls are filtered out of search results (if desired) after a crawl has already been performed.
- RESULTS_PER_BLOCK
- in file index_bundle_iterator.php, class constant IndexBundleIterator::RESULTS_PER_BLOCK
Default number of documents returned for each block (at most)
- RETURN_BOTH
- in file hash_table.php, class constant HashTable::RETURN_BOTH
Flag for hash table lookup methods
- RETURN_PROBE_ON_KEY_FOUND
- in file hash_table.php, class constant HashTable::RETURN_PROBE_ON_KEY_FOUND
Flag for hash table lookup methods
- RETURN_VALUE
- in file hash_table.php, class constant HashTable::RETURN_VALUE
Flag for hash table lookup methods
- rewriteMixQuery
- in file phrase_model.php, method PhraseModel::rewriteMixQuery()
Rewrites a mix query so that it maps directly to a query about crawls
- ROBOT
- in file web_queue_bundle.php, class constant WebQueueBundle::ROBOT
Url type flag
- RobotProcessor
- in file robot_processor.php, class RobotProcessor
Processor class used to extract information from robots.txt files
- robotSetUp
- in file configure_tool.php, method ConfigureTool::robotSetUp()
Used to set up the name of this instance of the Yioop robot as well as its description page.
- robot_data_base_name
- in file crawl_constants.php, class constant CrawlConstants::robot_data_base_name
- ROBOT_INSTANCE
- in file crawl_constants.php, class constant CrawlConstants::ROBOT_INSTANCE
- ROBOT_METAS
- in file crawl_constants.php, class constant CrawlConstants::ROBOT_METAS
- ROBOT_PATHS
- in file crawl_constants.php, class constant CrawlConstants::ROBOT_PATHS
- robot_table_name
- in file crawl_constants.php, class constant CrawlConstants::robot_table_name
- ROBOT_TXT
- in file crawl_constants.php, class constant CrawlConstants::ROBOT_TXT
- RoleModel
- in file role_model.php, class RoleModel
This is class is used to handle db results related to Role Administration
- rootPassword
- in file configure_tool.php, method ConfigureTool::rootPassword()
Used to change the password of the root account of this Yioop Instance
- rorderCallback
- in file utility.php, function rorderCallback()
Callback function used to sort documents by a field in reverse order
- RssLayout
- in file rss_layout.php, class RssLayout
Layout used for the seek_quarry Website including pages such as search landing page and settings page
- RssProcessor
- in file rss_processor.php, class RssProcessor
Used to create crawl summary information for RSS or Atom files
- RssView
- in file rss_view.php, class RssView
Web page used to present search results It is also contains the search box for people to types searches into
- RSS_ACCESS
- in file config.php, constant RSS_ACCESS
- RtfProcessor
- in file rtf_processor.php, class RtfProcessor
Used to create crawl summary information for RTF files
- run
- in file unit_test.php, method UnitTest::run()
Execute each of the test cases of this unit test and return the results
- runAllTests
- in file index.php, function runAllTests()
Runs all the unit_tests in the current directory and displays the results
- runPostProcessingPlugins
- in file queue_server.php, method QueueServer::runPostProcessingPlugins()
During crawl shutdown this is called to run any post processing plugins
- runTest
- in file index.php, function runTest()
Uses $name to load a unit test class, run the tests in it and display the results
- runTestBasedOnRequest
- in file index.php, function runTestBasedOnRequest()
Run the single unit test whose name is given in $_REQUEST['test'] and display the results. If the unit test file was blah_test.php, then $_REQUEST['test'] should be blah.
- rv
- in file tokenizer.php, method ItStemmer::rv()
Computes the starting index for region RV
- resultseditor_element.php
- procedural page resultseditor_element.php
- rss_layout.php
- procedural page rss_layout.php
- rss_view.php
- procedural page rss_view.php
top
s
- $save_frequency
- in file persistent_structure.php, variable PersistentStructure::$save_frequency
Number of operation between saves. If == -1 never save using checkSave
- $schedule_time
- in file fetcher.php, variable Fetcher::$schedule_time
Timestamp from a queue_server of the current schedule of sites to download. This is sent back to the server once this schedule is completed to help the queue server implement crawl-delay if needed.
- $seen_docs
- in file index_bundle_iterator.php, variable IndexBundleIterator::$seen_docs
The number of documents already iterated over
- $seen_docs_unfiltered
- in file intersect_iterator.php, variable IntersectIterator::$seen_docs_unfiltered
The number of iterated docs before the restriction test
- $seen_docs_unfiltered
- in file group_iterator.php, variable GroupIterator::$seen_docs_unfiltered
The number of iterated docs before the restriction test
- $seen_docs_unfiltered
- in file union_iterator.php, variable UnionIterator::$seen_docs_unfiltered
The number of iterated docs before the restriction test
- $server_name
- in file queue_server.php, variable QueueServer::$server_name
String used to describe this kind of queue server (Indexer, Scheduler, etc. in the log files.
- $server_type
- in file queue_server.php, variable QueueServer::$server_type
Used to say what kind of queue_server this is (one of BOTH, INDEXER,
- $shard_doc_lens
- in file index_dictionary.php, variable IndexDictionary::$shard_doc_lens
Length of the doc strings for each of the shards that have been added to the dictionary.
- $shard_lens
- in file doc_iterator.php, variable DocIterator::$shard_lens
An array of shard docids_lens
- $special_quote
- in file mysql_manager.php, variable MysqlManager::$special_quote
Used when to quote column names of db names that contain a
- $sql
- in file database_bundle_iterator.php, variable DatabaseBundleIterator::$sql
SQL query whose records we are index
- $start_delimiter
- in file text_archive_bundle_iterator.php, variable TextArchiveBundleIterator::$start_delimiter
Starting delimiters for records
- $start_offset
- in file word_iterator.php, variable WordIterator::$start_offset
Starting Offset of word occurence in the IndexShard
- $start_sync
- in file mirror.php, variable Mirror::$start_sync
Time of start of current sync
- $stats_file
- in file statistics_controller.php, variable StatisticsController::$stats_file
File name of file to cache generated statistic into
- $status_activities
- in file admin_controller.php, variable AdminController::$status_activities
An array of activities which are periodically updated within other activities that they live. For example, within manage crawl, the current crawl status is updated every 20 or so seconds.
- $status_filename
- in file text_archive_bundle_iterator.php, variable TextArchiveBundleIterator::$status_filename
File name to write this archive iterator status messages to
- $step1_changes
- in file tokenizer.php, variable ItStemmer::$step1_changes
Storage used in determinig if step1 removed any endings from the word
- $storage
- in file web_archive.php, variable WebArchive::$storage
If archive is stored as a string rather than persistently to disk
- $string_array
- in file string_array.php, variable StringArray::$string_array
Character string used to store the packed data of the StringArray
- $string_array_size
- in file string_array.php, variable StringArray::$string_array_size
Number of bytes of storage need by the string array
- $subname
- in file crawl_daemon.php, variable CrawlDaemon::$subname
Subname of the name prefix used on files associated with this daemon For example, the name might be fetcher, the subname might 2 to indicate which fetcher daemon instance.
- $subsearch_identifier
- in file search_controller.php, variable SearchController::$subsearch_identifier
The localization identifier for the current subsearch
- $subsearch_name
- in file search_controller.php, variable SearchController::$subsearch_name
Name of the sub-search currently in use
- $summaries
- in file index_archive_bundle.php, variable IndexArchiveBundle::$summaries
WebArchiveBundle for web page summaries
- $sum_seen_site_description_length
- in file fetcher.php, variable Fetcher::$sum_seen_site_description_length
The sum of the number of words of all the page description for the current crawl. This is used in computing document statistics.
- $sum_seen_site_link_length
- in file fetcher.php, variable Fetcher::$sum_seen_site_link_length
The sum of the number of words in all the page links for the current crawl. This is used in computing document statistics.
- $sum_seen_title_length
- in file fetcher.php, variable Fetcher::$sum_seen_title_length
The sum of the number of words of all the page titles for the current crawl. This is used in computing document statistics.
- $switch_partition_callback_name
- in file text_archive_bundle_iterator.php, variable TextArchiveBundleIterator::$switch_partition_callback_name
Name of function to be call whenever the partition is changed that the iterator is reading. The point of the callback is to read meta information at the start of the new partition
- $sync_dir
- in file mirror.php, variable Mirror::$sync_dir
Directory to sync
- $sync_schedule
- in file mirror.php, variable Mirror::$sync_schedule
Files to download for current sync
- search_controller.php
- procedural page search_controller.php
- settings_controller.php
- procedural page settings_controller.php
- static_controller.php
- procedural page static_controller.php
- statistics_controller.php
- procedural page statistics_controller.php
- search_api.php
- procedural page search_api.php
- sitemap_processor.php
- procedural page sitemap_processor.php
- svg_processor.php
- procedural page svg_processor.php
- string_array.php
- procedural page string_array.php
- sqlite3_manager.php
- procedural page sqlite3_manager.php
- sqlite_manager.php
- procedural page sqlite_manager.php
- searchfilters_model.php
- procedural page searchfilters_model.php
- signin_model.php
- procedural page signin_model.php
- source_model.php
- procedural page source_model.php
- save
- in file index_shard.php, method IndexShard::save()
Save the IndexShard to its filename
- save
- in file string_array.php, method StringArray::save()
Save the StringArray to its filename
- save
- in file persistent_structure.php, method PersistentStructure::save()
Save the PersistentStructure to its filename
- saveAndAddCurrentShardDictionary
- in file index_archive_bundle.php, method IndexArchiveBundle::saveAndAddCurrentShardDictionary()
Saves the active index shard to disk, then adds the words from this
- saveCheckPoint
- in file database_bundle_iterator.php, method DatabaseBundleIterator::saveCheckPoint()
Used to save the result row we are at so that the iterator can start from that row the next time it is invoked.
- saveCheckPoint
- in file text_archive_bundle_iterator.php, method TextArchiveBundleIterator::saveCheckPoint()
Stores the current progress to the file iterate_status.txt in the result
- saveCheckpoint
- in file web_archive_bundle_iterator.php, method WebArchiveBundleIterator::saveCheckpoint()
Saves the current state so that a new instantiation can pick up just after the last batch of pages extracted.
- saveCheckpoint
- in file mix_archive_bundle_iterator.php, method MixArchiveBundleIterator::saveCheckpoint()
Saves the current state so that a new instantiation can pick up just after the last batch of pages extracted.
- saveCheckpoint
- in file archive_bundle_iterator.php, method ArchiveBundleIterator::saveCheckpoint()
Stores the current progress to the file iterate_status.txt in the result
- saveCronTable
- in file cron_model.php, method CronModel::saveCronTable()
Saerializes and save the current cron table to disk
- SAVED_CRAWL_TIMES
- in file crawl_constants.php, class constant CrawlConstants::SAVED_CRAWL_TIMES
- saveLoadTestCase
- in file bloom_filter_file_test.php, method BloomFilterFileTest::saveLoadTestCase()
Check that if we force save the bloom filter file and then we reload it
- saveLoadTestCase
- in file index_shard_test.php, method IndexShardTest::saveLoadTestCase()
Check that save and load work
- saveMetaData
- in file bloom_filter_bundle.php, method BloomFilterBundle::saveMetaData()
Saves the meta data (number of filter, number of items stored, and size)
- save_point
- in file crawl_constants.php, class constant CrawlConstants::save_point
- SCAN_TEXT_SEG
- in file ppt_processor.php, class constant PptProcessor::SCAN_TEXT_SEG
- SCHEDULABLE
- in file web_queue_bundle.php, class constant WebQueueBundle::SCHEDULABLE
Url type flag
- schedule
- in file fetch_controller.php, method FetchController::schedule()
Checks if there is a schedule of sites to crawl available and if so present it to the requesting fetcher, and then delete it.
- SCHEDULER
- in file crawl_constants.php, class constant CrawlConstants::SCHEDULER
Used to say what kind of queue_server this is
- schedule_data_base_name
- in file crawl_constants.php, class constant CrawlConstants::schedule_data_base_name
- schedule_name
- in file crawl_constants.php, class constant CrawlConstants::schedule_name
- schedule_start_name
- in file crawl_constants.php, class constant CrawlConstants::schedule_start_name
- SCHEDULE_TIME
- in file crawl_constants.php, class constant CrawlConstants::SCHEDULE_TIME
- SCORE
- in file crawl_constants.php, class constant CrawlConstants::SCORE
- SCORE_PRECISION
- in file model.php, constant SCORE_PRECISION
- search
- in file code_tool.php, function search()
Performs a search for given pattern in files in supplied sub-folder/file
- searchAccess
- in file configure_tool.php, method ConfigureTool::searchAccess()
Configures which methods are allowed by this Yioop instance to access
- SearchController
- in file search_controller.php, class SearchController
Controller used to handle search requests to SeekQuarry search site. Used to both get and display search results.
- searchFile
- in file code_tool.php, function searchFile()
Callback function applied to each file in the directory being traversed by @see search(). Searches $filename matching $pattern and outputs line numbers and lines
- SearchfiltersModel
- in file searchfilters_model.php, class SearchfiltersModel
This class manages the persistence to disk of a set of urls to be filtered from all search results returned by Yioop!
- searchPageElementLinks
- in file configure_tool.php, method ConfigureTool::searchPageElementLinks()
Configures which of the various links of the SERPS page such as Cache, etc should be displayed. Also, configures whether the signin links, etc should be displayed.
- searchSources
- in file admin_controller.php, method AdminController::searchSources()
Handles admin request related to the search sources activity
- SearchsourcesElement
- in file searchsources_element.php, class SearchsourcesElement
Contains the forms for managing search sources for video, news, etc.
- SearchView
- in file search_view.php, class SearchView
Web page used to present search results It is also contains the search box for people to types searches into
- SECONDS_IN_A_BIN
- in file file_cache.php, class constant FileCache::SECONDS_IN_A_BIN
How many seconds a bin is vulnerable to be deleted as expired
- seekEndObjects
- in file web_archive.php, method WebArchive::seekEndObjects()
Seeks in the WebArchive file to the end of the last Object.
- seekPage
- in file database_bundle_iterator.php, method DatabaseBundleIterator::seekPage()
Advances the iterator to the $limit page, with as little additional processing as possible
- seekPage
- in file archive_bundle_iterator.php, method ArchiveBundleIterator::seekPage()
Advances the iterator to the $limit page, with as little additional processing as possible
- SEEN_URLS
- in file crawl_constants.php, class constant CrawlConstants::SEEN_URLS
- SEEN_URLS_BEFORE_UPDATE_SCHEDULER
- in file config.php, constant SEEN_URLS_BEFORE_UPDATE_SCHEDULER
How many non robot urls the fetcher successfully downloads before
- SEGMENT_SIZE
- in file index_dictionary.php, class constant IndexDictionary::SEGMENT_SIZE
When merging two files on a given dictionary tier. This is the max number
- selectCurrentServerAndUpdateIfNeeded
- in file fetcher.php, method Fetcher::selectCurrentServerAndUpdateIfNeeded()
At least once, and while memory is low picks at server at random and send any fetcher data we have to it.
- selectDB
- in file mysql_manager.php, method MysqlManager::selectDB()
- selectDB
- in file datasource_manager.php, method DatasourceManager::selectDB()
Connects to the correct DB on that system
- selectDB
- in file sqlite3_manager.php, method Sqlite3Manager::selectDB()
- selectDB
- in file pdo_manager.php, method PdoManager::selectDB()
- selectDB
- in file sqlite_manager.php, method SqliteManager::selectDB()
- sendStartCrawlMessage
- in file crawl_model.php, method CrawlModel::sendStartCrawlMessage()
Used to send a message to the queue_servers to start a crawl
- sendStartCrawlMessage
- in file crawl_controller.php, method CrawlController::sendStartCrawlMessage()
Receives a request to start a crawl from a remote name server
- sendStopCrawlMessage
- in file crawl_model.php, method CrawlModel::sendStopCrawlMessage()
Used to send a message to the queue_servers to stop a crawl
- sendStopCrawlMessage
- in file crawl_controller.php, method CrawlController::sendStopCrawlMessage()
Receives a request to stop a crawl from a remote name server
- SERVER
- in file crawl_constants.php, class constant CrawlConstants::SERVER
- SERVER_ALPHA
- in file config.php, constant SERVER_ALPHA
For a given number of search results total to return (total_num)
- SERVER_VERSION
- in file crawl_constants.php, class constant CrawlConstants::SERVER_VERSION
- SESSION_NAME
- in file config.php, constant SESSION_NAME
name of the cookie used to manage the session
- set
- in file searchfilters_model.php, method SearchfiltersModel::set()
Sets a list of hostnames to be filtered from search results
- set
- in file analytics_manager.php, method AnalyticsManager::set()
Used to set the timing statistic $value associated with $attribute
- set
- in file file_cache.php, method FileCache::set()
Stores in the cache a key-value pair
- setArchiveInfo
- in file web_archive_bundle.php, method WebArchiveBundle::setArchiveInfo()
Sets the archive info (DESCRIPTION, COUNT, NUM_DOCS_PER_PARTITION) for this web archive
- setArchiveInfo
- in file index_archive_bundle.php, method IndexArchiveBundle::setArchiveInfo()
Sets the archive info (DESCRIPTION, COUNT, NUM_DOCS_PER_PARTITION) for the web archive bundle associated with this bundle. As DESCRIPTION is used to store info about the info bundle this sets the global properties of the info bundle as well.
- setBit
- in file bloom_filter_file.php, method BloomFilterFile::setBit()
Sets to true the ith bit position in the filter.
- setCrawlDelay
- in file web_queue_bundle.php, method WebQueueBundle::setCrawlDelay()
Sets the Crawl-delay of $host to passes $value in seconds
- setCrawlMix
- in file crawl_model.php, method CrawlModel::setCrawlMix()
Stores in DB the supplied crawl mix object
- setCrawlParamsFromArray
- in file fetcher.php, method Fetcher::setCrawlParamsFromArray()
Sets parameters for fetching based on provided info struct ($info typically would come from the queue server)
- setCrawlSeedInfo
- in file crawl_controller.php, method CrawlController::setCrawlSeedInfo()
Handles a request to change the parameters of a crawl of a given
- setCrawlSeedInfo
- in file crawl_model.php, method CrawlModel::setCrawlSeedInfo()
Changes the crawl parameters of an existing crawl (can be while crawling) Not all fields are allowed to be updated
- setCurrentIndexDatabaseName
- in file crawl_model.php, method CrawlModel::setCurrentIndexDatabaseName()
Sets the IndexArchive that will be used for search results
- setCurrentShard
- in file index_archive_bundle.php, method IndexArchiveBundle::setCurrentShard()
Sets the current shard to be the $i th shard in the index bundle.
- setIniInfo
- in file text_archive_bundle_iterator.php, method TextArchiveBundleIterator::setIniInfo()
Mutator Method for controller how this text archive iterator behaves Normally, data, on compression, start, stop delimiter read from an ini file. This reads it from the supplied array.
- setLocaleObject
- in file locale_functions.php, function setLocaleObject()
Sets the language to be used for locale settings
- setQueueFlag
- in file web_queue_bundle.php, method WebQueueBundle::setQueueFlag()
Sets the flag which provides additional information about the
- setResultsPerBlock
- in file index_bundle_iterator.php, method IndexBundleIterator::setResultsPerBlock()
Sets the value of the result_per_block field. This field controls the maximum number of results that can be returned in one go by currentDocsWithWord()
- setResultsPerBlock
- in file union_iterator.php, method UnionIterator::setResultsPerBlock()
This method is supposed to set
- setResultsPerBlock
- in file intersect_iterator.php, method IntersectIterator::setResultsPerBlock()
This method is supposed to set
- setResultsPerBlock
- in file negation_iterator.php, method NegationIterator::setResultsPerBlock()
This method is supposed to set
- setSeedInfo
- in file crawl_model.php, method CrawlModel::setSeedInfo()
Writes a crawl.ini file with the provided data to the user's WORK_DIRECTORY
- setStaticPage
- in file locale_model.php, method LocaleModel::setStaticPage()
Save the static page data for with the given name to the given locale_tag
- SettingsController
- in file settings_controller.php, class SettingsController
Controller used to handle search requests to SeekQuarry search site. Used to both get and display search results.
- SettingsView
- in file settings_view.php, class SettingsView
Draws the view on which people can control their search settings such as num links per screen and the language settings
- setUp
- in file string_array_test.php, method StringArrayTest::setUp()
We'll use two different tables one more representative of how the table
- setUp
- in file index_shard_test.php, method IndexShardTest::setUp()
Construct some index shard we can add documents to
- setUp
- in file bloom_filter_file_test.php, method BloomFilterFileTest::setUp()
Set up a bloom filter that can store up to 10 items and that saves
- setUp
- in file xlsx_processor_test.php, method XlsxProcessorTest::setUp()
sets up the initial content for the testcase by extracting
- setUp
- in file trie_test.php, method TrieTest::setUp()
We'll set up one Trie for testing purpose
- setUp
- in file url_parser_test.php, method UrlParserTest::setUp()
UrlParser uses static methods so doesn't do anything right now
- setUp
- in file web_queue_bundle_test.php, method WebQueueBundleTest::setUp()
Set up a web queue bundle that can store 1000 urls in ram, has bloom filter space for 1000 urls and which uses a maximum value returning priority queue.
- setUp
- in file web_archive_test.php, method WebArchiveTest::setUp()
Creates a new web archive object that we can add objects to
- setUp
- in file hash_table_test.php, method HashTableTest::setUp()
We'll use two different tables one more representative of how the table
- setUp
- in file epub_processor_test.php, method EpubProcessorTest::setUp()
Creates a new EpubProcessor object so that we can process an .epub format file.
- setUp
- in file priority_queue_test.php, method PriorityQueueTest::setUp()
We setup two queue one that always returns the max element, one that
- setUp
- in file it_stemmer_test.php, method ItStemmerTest::setUp()
- setUp
- in file phrase_parser_test.php, method PhraseParserTest::setUp()
PhraseParser uses static methods so doesn't do anything right now
- setUp
- in file unit_test.php, method UnitTest::setUp()
This method is called before each test case is run to set up the
- setUp
- in file pptx_processor_test.php, method PptxProcessorTest::setUp()
Creates a summary of pptx document to check
- setUp
- in file queue_server_test.php, method QueueServerTest::setUp()
Creates a QueueServer object with an initial set of indexed file types
- setUserSession
- in file user_model.php, method UserModel::setUserSession()
Stores into DB the $session associative array of given user
- setWorkDirectoryConfigFile
- in file profile_model.php, method ProfileModel::setWorkDirectoryConfigFile()
Modifies the config.php file so the WORK_DIRECTORY define points at $directory
- setWorldPermissions
- in file utility.php, function setWorldPermissions()
This is a callback function used in the process of recursively chmoding to 777 all files in a folder
- setWorldPermissionsRecursive
- in file datasource_manager.php, method DatasourceManager::setWorldPermissionsRecursive()
Recursively chmod a directory to 0777
- setWritePartition
- in file web_archive_bundle.php, method WebArchiveBundle::setWritePartition()
Advances the index of the write partition by one and creates the corresponding web archive.
- SHARD_BLOCK_POWER
- in file index_shard.php, class constant IndexShard::SHARD_BLOCK_POWER
Shard block size is 1<< this power
- SHARD_BLOCK_SIZE
- in file index_shard.php, class constant IndexShard::SHARD_BLOCK_SIZE
Size in bytes of one block in IndexShard
- sheetCount
- in file xlsx_processor.php, method XlsxProcessor::sheetCount()
Returns the count of worksheets in the xlsx file
- show_page
- in file static_controller.php, method StaticController::show_page()
This activity is used to display one of a set of static pages used by the Yioop Web Site
- shutdownDictionary
- in file queue_server.php, method QueueServer::shutdownDictionary()
During crawl shutdown, this function is called to do a final save and merge of the crawl dictionary, so that it is ready to serve queries.
- signin
- in file admin_controller.php, method AdminController::signin()
This method is data to signin a user and initialize the data to be display in a view
- SigninElement
- in file signin_element.php, class SigninElement
Element responsible for drawing links to settings and login panels
- SigninModel
- in file signin_model.php, class SigninModel
This is class is used to handle db results needed for a user to login
- SigninView
- in file signin_view.php, class SigninView
This View is responsible for drawing the login screen for the admin panel of the Seek Quarry app
- SIGNIN_LINK
- in file config.php, constant SIGNIN_LINK
- SIMILAR_LINK
- in file config.php, constant SIMILAR_LINK
- simplifyUrl
- in file url_parser.php, method UrlParser::simplifyUrl()
Converts a url with a scheme into one without. Also removes trailing slashes from url. Shortens url to desired length by inserting ellipsis for part of it if necessary
- simplifyUrlTestCase
- in file url_parser_test.php, method UrlParserTest::simplifyUrlTestCase()
Tests simplifyUrl function used on SERP pages
- SINGLE_PAGE_TIMEOUT
- in file config.php, constant SINGLE_PAGE_TIMEOUT
time in seconds before we give up on a single page request
- SitemapProcessor
- in file sitemap_processor.php, class SitemapProcessor
Used to create crawl summary information for sitemap files
- SITES
- in file crawl_constants.php, class constant CrawlConstants::SITES
- SITE_INFO
- in file crawl_constants.php, class constant CrawlConstants::SITE_INFO
- SIZE
- in file crawl_constants.php, class constant CrawlConstants::SIZE
- slides
- in file pptx_processor.php, method PptxProcessor::slides()
Returns number of slides of pptx based on its document object
- SNIPPET_LENGTH_LEFT
- in file model.php, constant SNIPPET_LENGTH_LEFT
- SNIPPET_LENGTH_RIGHT
- in file model.php, constant SNIPPET_LENGTH_RIGHT
- SourceModel
- in file source_model.php, class SourceModel
Used to manage data related to video, news, and other search sources Also, used to manage data about available subsearches seen in SearchView
- SOURCE_NAME
- in file crawl_constants.php, class constant CrawlConstants::SOURCE_NAME
- Sqlite3Manager
- in file sqlite3_manager.php, class Sqlite3Manager
SQLite3 DatasourceManager
- SqliteManager
- in file sqlite_manager.php, class SqliteManager
SQLite DatasourceManager
- start
- in file news_updater.php, method NewsUpdater::start()
This is the function that should be called to get the newsupdater to
- start
- in file fetcher.php, method Fetcher::start()
This is the function that should be called to get the fetcher to start
- start
- in file query_tool.php, method QueryTool::start()
Runs the QueryTool on the supplied command line arguments
- start
- in file crawl_daemon.php, method CrawlDaemon::start()
Used to start a daemon running in the background
- start
- in file mirror.php, method Mirror::start()
This is the function that should be called to get the mirror to start
- start
- in file queue_server.php, method QueueServer::start()
This is the function that should be called to get the queue_server
- start
- in file arc_tool.php, method ArcTool::start()
Runs the ArcTool on the supplied command line arguments
- startCrawl
- in file admin_controller.php, method AdminController::startCrawl()
Called from @see manageCrawls to start a new crawl on the machines $machine_urls. Updates $data array with crawl start message
- startCrawl
- in file queue_server.php, method QueueServer::startCrawl()
Begins crawling base on time, order, restricted site $info Setting up a crawl involves creating a queue bundle and an index archive bundle
- START_PARTITION
- in file crawl_constants.php, class constant CrawlConstants::START_PARTITION
- StaticController
- in file static_controller.php, class StaticController
This controller is used by the Yioop web site to display static pages.
- StaticView
- in file static_view.php, class StaticView
This View is responsible for drawing the landing page of the Seek Quarry app
- StatisticsController
- in file statistics_controller.php, class StatisticsController
Responsible for handling requests about global crawl statistics for
- StatisticsView
- in file statistics_view.php, class StatisticsView
Draws a view displaying statistical information about a web crawl such as number of hosts visited, distribution of file sizes, distribution of file type, distribution of languages, etc
- statistics_base_name
- in file crawl_constants.php, class constant CrawlConstants::statistics_base_name
- STATISTIC_REFRESH_RATE
- in file statistics_controller.php, class constant StatisticsController::STATISTIC_REFRESH_RATE
While computing the statistics page, number of seconds until a
- STATUS
- in file crawl_constants.php, class constant CrawlConstants::STATUS
- statuses
- in file crawl_daemon.php, method CrawlDaemon::statuses()
Returns the statuses of the running daemons
- statuses
- in file machine_controller.php, method MachineController::statuses()
Checks the running/non-running status of the
- stem
- in file tokenizer.php, method EnStemmer::stem()
Computes the stem of an English word
- stem
- in file tokenizer.php, method ItStemmer::stem()
Computes the stem of an Italian word Example guardando,guardandogli,guardandola,guardano all stem to guard
- stemmerTestCase
- in file it_stemmer_test.php, method ItStemmerTest::stemmerTestCase()
Tests whether the stem funtion for the Italian stemming algorithm
- stemTerms
- in file phrase_parser.php, method PhraseParser::stemTerms()
Splits supplied string based on white space, then stems each terms according to the stemmer for $lanf if exists
- step0
- in file tokenizer.php, method ItStemmer::step0()
Handles attached pronoun
- step1
- in file tokenizer.php, method ItStemmer::step1()
Handles standard suffixes
- step2
- in file tokenizer.php, method ItStemmer::step2()
Handles verb suffixes
- step3a
- in file tokenizer.php, method ItStemmer::step3a()
Deletes a final a,e,i,o,a`,e`,i`,o` and a preceding i if in RV
- step3b
- in file tokenizer.php, method ItStemmer::step3b()
Replaces a final ch/gh by c/g if in RV
- stop
- in file crawl_daemon.php, method CrawlDaemon::stop()
Used to stop a daemon that is running in the background
- stopCrawl
- in file queue_server.php, method QueueServer::stopCrawl()
Used to stop the currently running crawl gracefully so that it can be restarted. This involved writing the queue's contents back to schedules, making the crawl's dictionary all the same tier and running any indexing_plugins.
- STOP_STATE
- in file crawl_constants.php, class constant CrawlConstants::STOP_STATE
- STORE_FLAG
- in file index_shard.php, class constant IndexShard::STORE_FLAG
Represents an empty prefix item
- StringArray
- in file string_array.php, class StringArray
Memory efficient implementation of persistent arrays
- StringArrayTest
- in file string_array_test.php, class StringArrayTest
Used to test that the StringArray class properly stores/retrieves values, and can handle loading and saving
- SUBDOCS
- in file crawl_constants.php, class constant CrawlConstants::SUBDOCS
- SUBDOCTYPE
- in file crawl_constants.php, class constant CrawlConstants::SUBDOCTYPE
- SubsearchElement
- in file subsearch_element.php, class SubsearchElement
Element responsible for drawing links to common subsearches
- suggest
- in file resource_controller.php, method ResourceController::suggest()
Used to get a keyword suggest trie. This sends additional
- SUMMARY
- in file crawl_constants.php, class constant CrawlConstants::SUMMARY
- SUMMARY_OFFSET
- in file crawl_constants.php, class constant CrawlConstants::SUMMARY_OFFSET
- SvgProcessor
- in file svg_processor.php, class SvgProcessor
Used to create crawl summary information
- syncGenDocOffsetsAmongstIterators
- in file intersect_iterator.php, method IntersectIterator::syncGenDocOffsetsAmongstIterators()
Finds the next generation and doc offset amongst all the iterators
- syncGenDocOffsetsAmongstIterators
- in file negation_iterator.php, method NegationIterator::syncGenDocOffsetsAmongstIterators()
Finds the next generation and doc offset amongst the all docs iterator and the term to be negated iterator such that the all iterator is strictly less than the term iterator.
- syncList
- in file resource_controller.php, method ResourceController::syncList()
Returns a list of syncable files and the modification times
- syncNotify
- in file resource_controller.php, method ResourceController::syncNotify()
Used to notify a machine that another machine acting as a mirror
- systemCheck
- in file admin_controller.php, method AdminController::systemCheck()
Checks to see if the current machine has php configured in a way Yioop! can run.
- socket_experiment.php
- procedural page socket_experiment.php
- string_array_test.php
- procedural page string_array_test.php
- string_cat_experiment.php
- procedural page string_cat_experiment.php
- searchsources_element.php
- procedural page searchsources_element.php
- signin_element.php
- procedural page signin_element.php
- subsearch_element.php
- procedural page subsearch_element.php
- search_view.php
- procedural page search_view.php
- settings_view.php
- procedural page settings_view.php
- signin_view.php
- procedural page signin_view.php
- static_view.php
- procedural page static_view.php
- statistics_view.php
- procedural page statistics_view.php
top
t
- $test_case_results
- in file unit_test.php, variable UnitTest::$test_case_results
Used to store the results for each test sub case
- $test_objects
- in file unit_test.php, variable UnitTest::$test_objects
Used to hold objects to be used in tests
- $time
- in file crawl_daemon.php, variable CrawlDaemon::$time
Used by processHandler to decide when to update the lock file
- $total_time
- in file datasource_manager.php, variable DatasourceManager::$total_time
Used to store the total time taken to execute queries
- $to_advance_index
- in file negation_iterator.php, variable NegationIterator::$to_advance_index
Index of the iterator amongst those we are intersecting to advance
- $to_advance_index
- in file intersect_iterator.php, variable IntersectIterator::$to_advance_index
Index of the iterator amongst those we are intersecting to advance
- $to_crawl
- in file fetcher.php, variable Fetcher::$to_crawl
Contains the list of web pages to crawl from a queue_server
- $to_crawl_again
- in file fetcher.php, variable Fetcher::$to_crawl_again
Contains the list of web pages to crawl that failed on first attempt
- $to_crawl_archive
- in file web_queue_bundle.php, variable WebQueueBundle::$to_crawl_archive
WebArchive used to store urls that are to be crawled
- $to_crawl_queue
- in file web_queue_bundle.php, variable WebQueueBundle::$to_crawl_queue
the PriorityQueue used by this WebQueueBundle
- $to_crawl_table
- in file web_queue_bundle.php, variable WebQueueBundle::$to_crawl_table
the HashTable used by this WebQueueBundle
- $trie_array
- in file trie.php, variable Trie::$trie_array
A nested array used to represent the trie
- token_tool.php
- procedural page token_tool.php
- text_archive_bundle_iterator.php
- procedural page text_archive_bundle_iterator.php
- text_processor.php
- procedural page text_processor.php
- trie.php
- procedural page trie.php
- tokenizer.php
- procedural page tokenizer.php
- tokenizer.php
- procedural page tokenizer.php
- tokenizer.php
- procedural page tokenizer.php
- tokenizer.php
- procedural page tokenizer.php
- tokenizer.php
- procedural page tokenizer.php
- tokenizer.php
- procedural page tokenizer.php
- tokenizer.php
- procedural page tokenizer.php
- tokenizer.php
- procedural page tokenizer.php
- tokenizer.php
- procedural page tokenizer.php
- tokenizer.php
- procedural page tokenizer.php
- tokenizer.php
- procedural page tokenizer.php
- tokenizer.php
- procedural page tokenizer.php
- tokenizer.php
- procedural page tokenizer.php
- tokenizer.php
- procedural page tokenizer.php
- tokenizer.php
- procedural page tokenizer.php
- tokenizer.php
- procedural page tokenizer.php
- tokenizer.php
- procedural page tokenizer.php
- tokenizer.php
- procedural page tokenizer.php
- tokenizer.php
- procedural page tokenizer.php
- tokenizer.php
- procedural page tokenizer.php
- tokenizer.php
- procedural page tokenizer.php
- tearDown
- in file bloom_filter_file_test.php, method BloomFilterFileTest::tearDown()
Since a BloomFilterFile is a PersistentStructure it periodically saves
- tearDown
- in file string_array_test.php, method StringArrayTest::tearDown()
Since a StringArray is a PersistentStructure it periodically saves
- tearDown
- in file xlsx_processor_test.php, method XlsxProcessorTest::tearDown()
Can be used for clenup activity
- tearDown
- in file index_shard_test.php, method IndexShardTest::tearDown()
Deletes any index shard files we may have created
- tearDown
- in file hash_table_test.php, method HashTableTest::tearDown()
Since a HashTable is a PersistentStructure it periodically saves
- tearDown
- in file url_parser_test.php, method UrlParserTest::tearDown()
UrlParser uses static methods so doesn't do anything right now
- tearDown
- in file web_archive_test.php, method WebArchiveTest::tearDown()
Delete any files associated with out test web archive
- tearDown
- in file it_stemmer_test.php, method ItStemmerTest::tearDown()
- tearDown
- in file queue_server_test.php, method QueueServerTest::tearDown()
Used to get rid of any object/files we created during a test case.
- tearDown
- in file epub_processor_test.php, method EpubProcessorTest::tearDown()
Delete any files associated with our test on EpubProcessor
- tearDown
- in file priority_queue_test.php, method PriorityQueueTest::tearDown()
Since our queues are persistent structures, we delete files that might be
- tearDown
- in file trie_test.php, method TrieTest::tearDown()
Since a Trie is not a PersistentStructure we don't need to do
- tearDown
- in file web_queue_bundle_test.php, method WebQueueBundleTest::tearDown()
Delete the directory and files associated with the WebQueueBundle
- tearDown
- in file phrase_parser_test.php, method PhraseParserTest::tearDown()
PhraseParser uses static methods so doesn't do anything right now
- tearDown
- in file pptx_processor_test.php, method PptxProcessorTest::tearDown()
Test object is set to null
- tearDown
- in file unit_test.php, method UnitTest::tearDown()
This method is called after each test case is run to clean up
- testDatabaseManager
- in file profile_model.php, method ProfileModel::testDatabaseManager()
Checks if $dbinfo provides info to connect to an working instance of app db.
- testEpubDescriptionTestCase
- in file epub_processor_test.php, method EpubProcessorTest::testEpubDescriptionTestCase()
Test case to check whether the description of the document is not empty.
- testEpubLangTestCase
- in file epub_processor_test.php, method EpubProcessorTest::testEpubLangTestCase()
Test case to check whether the language of the document is retrieved correctly.
- testEpubTitleTestCase
- in file epub_processor_test.php, method EpubProcessorTest::testEpubTitleTestCase()
Test case to check whether the title of the epub document is retrieved correctly.
- trie_test.php
- procedural page trie_test.php
- TEST_INFO
- in file config.php, constant TEST_INFO
bit of DEBUG_LEVEL used to indicate test cases should be displayable
- TextArchiveBundleIterator
- in file text_archive_bundle_iterator.php, class TextArchiveBundleIterator
Used to iterate through the records of a collection of text or compressed text-oriented records
- TextProcessor
- in file text_processor.php, class TextProcessor
Parent class common to all processors used to create crawl summary information that involves basically text data
- TEXT_SUFFIX
- in file nword_grams.php, class constant NWordGrams::TEXT_SUFFIX
Suffix appended to language tag to create the text file name containing bigrams.
- THUMB
- in file crawl_constants.php, class constant CrawlConstants::THUMB
- TIMESTAMP
- in file crawl_constants.php, class constant CrawlConstants::TIMESTAMP
- title
- in file xlsx_processor.php, method XlsxProcessor::title()
Returns title of a xlsx file from each worksheet
- title
- in file svg_processor.php, method SvgProcessor::title()
Returns html head title of a webpage based on its document object
- title
- in file rss_processor.php, method RssProcessor::title()
Returns html head title of a webpage based on its document object
- title
- in file html_processor.php, method HtmlProcessor::title()
Returns html head title of a webpage based on its document object
- TITLE
- in file crawl_constants.php, class constant CrawlConstants::TITLE
- title
- in file pptx_processor.php, method PptxProcessor::title()
Returns powerpoint head title of a pptx based on its document object
- titleTestCase
- in file xlsx_processor_test.php, method XlsxProcessorTest::titleTestCase()
Tests that the title is correct
- TITLE_LENGTH
- in file crawl_constants.php, class constant CrawlConstants::TITLE_LENGTH
- TITLE_LENGTH
- in file model.php, constant TITLE_LENGTH
- TITLE_WEIGHT
- in file config.php, constant TITLE_WEIGHT
BM25F weight for title text
- TITLE_WORDS
- in file crawl_constants.php, class constant CrawlConstants::TITLE_WORDS
- TITLE_WORD_SCORE
- in file crawl_constants.php, class constant CrawlConstants::TITLE_WORD_SCORE
- tl
- in file locale_functions.php, function tl()
Translate the supplied arguments into the current locale.
- toBinString
- in file utility.php, function toBinString()
Converts a string to string where each char has been replaced by its binary equivalent
- ToggleHelper
- in file toggle_helper.php, class ToggleHelper
This is a helper class is used to draw an On-Off switch in a web page
- toggleHistory
- in file search_controller.php, method SearchController::toggleHistory()
The history toggle displays the year and month associated with the timestamp at which the page was cached.
- toHexString
- in file utility.php, function toHexString()
Converts a string to string where each char has been replaced by its hexadecimal equivalent
- totalWeight
- in file priority_queue.php, method PriorityQueue::totalWeight()
Computes and returns the weight of all items in prority queue
- TOTAL_TIME
- in file crawl_constants.php, class constant CrawlConstants::TOTAL_TIME
- TO_CRAWL
- in file crawl_constants.php, class constant CrawlConstants::TO_CRAWL
- translate
- in file locale_model.php, method LocaleModel::translate()
Translate an array consisting of an identifier string together with additional variable parameters into the current locale.
- translateDb
- in file model.php, method Model::translateDb()
Used to get the translation of a string_id stored in the database to the given locale.
- traverseDirectory
- in file datasource_manager.php, method DatasourceManager::traverseDirectory()
Recursively traverse a directory structure and call a callback function
- traverseDirectory
- in file search_api.php, function traverseDirectory()
Recursively traverse a directory structure and call a callback function
- traverseExtractRecursive
- in file locale_model.php, method LocaleModel::traverseExtractRecursive()
Traverses a directory and its subdirectories looking for files
- Tree
- in file recipe_plugin.php, class Tree
class to define Minimum Spanning tree. constructMST constructs the minimum spanning tree using heap. formCluster forms clusters by deleting the most expensive edge. BreadthFirstSearch is used to traverse the MST.
- TreeCluster
- in file recipe_plugin.php, class TreeCluster
heap to maintain the tree
- Trie
- in file trie.php, class Trie
Implements a trie data structure which can be used to store terms read from a dictionary in a succinct way
- TrieTest
- in file trie_test.php, class TrieTest
Used to test that the Trie class properly stores words that could be used for an autosuggest dictionary
- TWO_MINUTES
- in file source_model.php, class constant SourceModel::TWO_MINUTES
Number of seconds in a two minutes
- TYPE
- in file crawl_constants.php, class constant CrawlConstants::TYPE
- toggle_helper.php
- procedural page toggle_helper.php
top
u
- $unsaved_operations
- in file persistent_structure.php, variable PersistentStructure::$unsaved_operations
Number of operations since the last save
- $url_exists_filter_bundle
- in file web_queue_bundle.php, variable WebQueueBundle::$url_exists_filter_bundle
BloomFilter used to keep track of which urls we've already seen
- union_iterator.php
- procedural page union_iterator.php
- unit_test.php
- procedural page unit_test.php
- upgrade_functions.php
- procedural page upgrade_functions.php
- url_parser.php
- procedural page url_parser.php
- utility.php
- procedural page utility.php
- user_model.php
- procedural page user_model.php
- url_parser_test.php
- procedural page url_parser_test.php
- UI_FLAGS
- in file crawl_constants.php, class constant CrawlConstants::UI_FLAGS
- unbase64Hash
- in file utility.php, function unbase64Hash()
Decodes a crawl hash number from base64 to raw ASCII
- uncompress
- in file compressor.php, method Compressor::uncompress()
Used to unapply the compress filter as when data is read out of a WebArchive.
- uncompress
- in file non_compressor.php, method NonCompressor::uncompress()
Used to unapply the compress filter as when data is read out of a WebArchive. In this case, the unapplying filter does nothing.
- uncompress
- in file gzip_compressor.php, method GzipCompressor::uncompress()
Used to unapply the compress filter as when data is read out of a WebArchive. In this case, unapplying the filter means gunzipping.
- uncompressInt
- in file non_compressor.php, method NonCompressor::uncompressInt()
Used to uncompress an int from a fixed length string in the format of the compression algorithm underlying the compressor. Since this compressor doesn't compress we just use unpack
- uncompressInt
- in file gzip_compressor.php, method GzipCompressor::uncompressInt()
Used to uncompress an int from a fixed length string in the format of
- uncompressInt
- in file compressor.php, method Compressor::uncompressInt()
Used to uncompress an int from a fixed length string in the format of the compression algorithm underlying the compressor.
- UnionIterator
- in file union_iterator.php, class UnionIterator
Used to iterate over the documents which occur in any of a set of WordIterator results
- UnitTest
- in file unit_test.php, class UnitTest
Base class for all the SeekQuarry/Yioop engine Unit tests
- UNIT_TEST_MODE
- in file queue_server_test.php, constant UNIT_TEST_MODE
- unlinkRecursive
- in file search_api.php, function unlinkRecursive()
Recursively delete a directory
- unlinkRecursive
- in file datasource_manager.php, method DatasourceManager::unlinkRecursive()
Recursively delete a directory
- unpackDoclenNum
- in file index_shard.php, method IndexShard::unpackDoclenNum()
Used to extract from a 32 bit unsigned int, a pair which represents the length of a document together with the number of keys in its doc_id
- unpackFloat
- in file utility.php, function unpackFloat()
Unpacks a float from a 4 char string
- unpackInt
- in file utility.php, function unpackInt()
Unpacks an int from a 4 char string
- unpackListModified9
- in file utility.php, function unpackListModified9()
Decoded a single word with high two bits off according to modified 9
- unpackPosting
- in file utility.php, function unpackPosting()
Given a packed integer string, uses the top three bytes to calculate a doc_index of a document in the shard, and uses the low order byte to computer a number of occurences of a word in that document.
- unpackWordDocs
- in file index_shard.php, method IndexShard::unpackWordDocs()
Takes the word docs string and splits it into posting lists which are assigned to particular words in the words dictionary array.
- unsetVariable
- in file page_rule_parser.php, method PageRuleParser::unsetVariable()
Unsets the key $field (or the crawl constant it corresponds to) in $page_data. If it is a crawlconstant it doesn't unset it -- it just sets it to the empty string
- update
- in file fetch_controller.php, method FetchController::update()
Processes Robot, To Crawl, and Index data sent from a fetcher Acknowledge to the fetcher if this data was received okay.
- update
- in file machine_model.php, method MachineModel::update()
Used to start or stop a queue_server, fetcher, mirror instance on a machine managed by the current one
- update
- in file machine_controller.php, method MachineController::update()
Used to start/stop a queue_server/fetcher of the current Yioop instance
- updateBuffer
- in file text_archive_bundle_iterator.php, method TextArchiveBundleIterator::updateBuffer()
If reading from a gzbuffer file goes off the end of the current buffer, reads in the next block from archive file.
- updateCronTime
- in file cron_model.php, method CronModel::updateCronTime()
Updates the Cron timestamp to the current time.
- updateDisallowedQuotaSites
- in file queue_server.php, method QueueServer::updateDisallowedQuotaSites()
This is called whenever the crawl options are modified to parse
- updateFeedItems
- in file source_model.php, method SourceModel::updateFeedItems()
For each feed source downloads the feeds, checks which items are not in the database, adds them and updates the inverted index for feeds
- updateFoundSites
- in file fetcher.php, method Fetcher::updateFoundSites()
Updates the $this->found_sites array with data from the most recently
- updateLocale
- in file locale_model.php, method LocaleModel::updateLocale()
Updates the configure.ini file and static pages for a particular locale.
- updateLocales
- in file locale_model.php, method LocaleModel::updateLocales()
Cycles through locale subdirectories in LOCALE_DIR, for each locale it merges out the current general_ini and strings data.
- updateLocaleSubFolder
- in file locale_model.php, method LocaleModel::updateLocaleSubFolder()
Copies over subfolder items of the correct file extensions which exists in a fallback directory, but not in the actual directory of a locale.
- updateMostRecentFetcher
- in file queue_server.php, method QueueServer::updateMostRecentFetcher()
Determines the most recent fetcher that has spoken with the
- updatePartition
- in file text_archive_bundle_iterator.php, method TextArchiveBundleIterator::updatePartition()
Helper function for nextChunk to advance the parition if we are at the end of the current archive file
- updateProfile
- in file profile_model.php, method ProfileModel::updateProfile()
Outputs a profile.php file in the given directory containing profile data based on new and old data sources
- updateProfileFields
- in file admin_controller.php, method AdminController::updateProfileFields()
- updateResultPage
- in file searchfilters_model.php, method SearchfiltersModel::updateResultPage()
Save/updates/deletes an override of a search engine result summary page. The information stored will be used instead of what was actually in the index when it comes to displaying search results for a page.
- updateScheduler
- in file fetcher.php, method Fetcher::updateScheduler()
Updates the queue_server about sites that have been crawled.
- updateStringData
- in file locale_model.php, method LocaleModel::updateStringData()
Updates the identifier_string-translation pairs (both static and dynamic) for a given locale
- updateTranslation
- in file locale_model.php, method LocaleModel::updateTranslation()
Computes a string of the form string_id = 'translation' for a string_id
- upgradeDatabaseVersion1
- in file upgrade_functions.php, function upgradeDatabaseVersion1()
Upgrades a Version 0 version of the Yioop! database to a Version 1 version
- upgradeDatabaseVersion2
- in file upgrade_functions.php, function upgradeDatabaseVersion2()
Upgrades a Version 1 version of the Yioop! database to a Version 2 version
- upgradeDatabaseVersion3
- in file upgrade_functions.php, function upgradeDatabaseVersion3()
Upgrades a Version 2 version of the Yioop! database to a Version 3 version
- upgradeDatabaseVersion4
- in file upgrade_functions.php, function upgradeDatabaseVersion4()
Upgrades a Version 3 version of the Yioop! database to a Version 4 version
- upgradeDatabaseVersion5
- in file upgrade_functions.php, function upgradeDatabaseVersion5()
Upgrades a Version 4 version of the Yioop! database to a Version 5 version
- upgradeDatabaseVersion6
- in file upgrade_functions.php, function upgradeDatabaseVersion6()
Upgrades a Version 5 version of the Yioop! database to a Version 6 version
- upgradeDatabaseVersion7
- in file upgrade_functions.php, function upgradeDatabaseVersion7()
Upgrades a Version 6 version of the Yioop! database to a Version 7 version
- upgradeDatabaseVersion8
- in file upgrade_functions.php, function upgradeDatabaseVersion8()
Upgrades a Version 7 version of the Yioop! database to a Version 8 version
- upgradeDatabaseVersion9
- in file upgrade_functions.php, function upgradeDatabaseVersion9()
Upgrades a Version 8 version of the Yioop! database to a Version 9 version
- upgradeDatabaseVersion10
- in file upgrade_functions.php, function upgradeDatabaseVersion10()
Upgrades a Version 9 version of the Yioop! database to a Version 10 version
- upgradeDatabaseVersion11
- in file upgrade_functions.php, function upgradeDatabaseVersion11()
Upgrades a Version 10 version of the Yioop! database to a Version 11 version
- upgradeDatabaseVersion12
- in file upgrade_functions.php, function upgradeDatabaseVersion12()
Upgrades a Version 11 version of the Yioop! database to a Version 12 version
- upgradeDatabaseVersion13
- in file upgrade_functions.php, function upgradeDatabaseVersion13()
Upgrades a Version 12 version of the Yioop! database to a Version 13 version
- upgradeDatabaseVersion14
- in file upgrade_functions.php, function upgradeDatabaseVersion14()
Upgrades a Version 13 version of the Yioop! database to a Version 14 version
- upgradeDatabaseVersion15
- in file upgrade_functions.php, function upgradeDatabaseVersion15()
Upgrades a Version 14 version of the Yioop! database to a Version 15 version
- upgradeDatabaseWorkDirectory
- in file upgrade_functions.php, function upgradeDatabaseWorkDirectory()
If the database data of Yioop! is older than the version of the
- upgradeDatabaseWorkDirectoryCheck
- in file upgrade_functions.php, function upgradeDatabaseWorkDirectoryCheck()
Checks to see if the database data or work_dir folder of Yioop! is from an
- upgradeLocales
- in file upgrade_functions.php, function upgradeLocales()
If the locale data of Yioop! in the work directory is older than the currently running Yioop! then this function is called to at least try to copy the new strings into the old profile.
- upgradeLocalesCheck
- in file upgrade_functions.php, function upgradeLocalesCheck()
Checks to see if the locale data of Yioop! in the work dir is older than the
- uploadCrawlData
- in file fetcher.php, method Fetcher::uploadCrawlData()
Sends to crawl, robot, and index data to the current queue server.
- URL
- in file crawl_constants.php, class constant CrawlConstants::URL
- urlMemberSiteArray
- in file url_parser.php, method UrlParser::urlMemberSiteArray()
Checks if the url belongs to one of the sites listed in site_array Sites can be either given in the form domain:host or in the form of a url in which case it is check that the site url is a substring of the passed url.
- urlMemberSiteArrayTestCase
- in file url_parser_test.php, method UrlParserTest::urlMemberSiteArrayTestCase()
urlMemberSiteArray is a function called by both allowedToCrawlSite
- UrlParser
- in file url_parser.php, class UrlParser
Library of functions used to manipulate and to extract components from urls
- UrlParserTest
- in file url_parser_test.php, class UrlParserTest
Used to test that the UrlParser class. For now, want to see that the method canonicalLink is working correctly and that isPathMemberRegexPaths (used in robot_processor.php) works
- URL_FILTER_SIZE
- in file config.php, constant URL_FILTER_SIZE
bloom filters are used to keep track of which urls are visited, this parameter determines up to how many urls will be stored in a single filter. Additional filters are read to and from disk.
- URL_INFO
- in file crawl_constants.php, class constant CrawlConstants::URL_INFO
- URL_WEIGHT
- in file crawl_constants.php, class constant CrawlConstants::URL_WEIGHT
- usageMessageAndExit
- in file query_tool.php, method QueryTool::usageMessageAndExit()
Outputs the "how to use this tool message" and then exit()'s.
- usageMessageAndExit
- in file arc_tool.php, method ArcTool::usageMessageAndExit()
Outputs the "how to use this tool message" and then exit()'s.
- UserModel
- in file user_model.php, class UserModel
This class is used to handle database statements related to User Administration
- USER_AGENT
- in file config.php, constant USER_AGENT
this is the User-Agent names the crawler provides
- USER_AGENT_SHORT
- in file config.php, constant USER_AGENT_SHORT
- USE_CACHE
- in file arc_tool.php, constant USE_CACHE
USE_CACHE false rules out file cache as well
- USE_FILECACHE
- in file config.php, constant USE_FILECACHE
- USE_MEMCACHE
- in file config.php, constant USE_MEMCACHE
- utf8chr
- in file locale_functions.php, function utf8chr()
Given a unicode codepoint convert it to UTF-8
top
v
- $value_size
- in file hash_table.php, variable HashTable::$value_size
The size in bytes of values associated with keys
- $value_size
- in file priority_queue.php, variable PriorityQueue::$value_size
Number of bytes needed to store a value associated with a weight
- $version
- in file web_archive.php, variable WebArchive::$version
version number of the current archive
- $video_sources
- in file queue_server.php, variable QueueServer::$video_sources
List of media sources mainly to determine the value of the media:
- $video_sources
- in file fetcher.php, variable Fetcher::$video_sources
List of video sources mainly to determine the value of the media:
- $view
- in file layout.php, variable Layout::$view
The view that is to be drawn on this layout
- $view
- in file element.php, variable Element::$view
The View on which this Element is drawn
- $views
- in file admin_controller.php, variable AdminController::$views
Says which views to load for this controller
- $views
- in file resource_controller.php, variable ResourceController::$views
Only outputs JSON data so don't need view
- $views
- in file settings_controller.php, variable SettingsController::$views
Load the SettingsView
- $views
- in file static_controller.php, variable StaticController::$views
Says which views to load for this controller.
- $views
- in file search_controller.php, variable SearchController::$views
Says which views to load for this controller.
- $views
- in file archive_controller.php, variable ArchiveController::$views
This controller does not make use of any views
- $views
- in file crawl_controller.php, variable CrawlController::$views
Only outputs serialized php data so don't need view
- $views
- in file fetch_controller.php, variable FetchController::$views
Load FetchView to return results to fetcher
- $views
- in file controller.php, variable Controller::$views
Array of the view classes used by this controller
- $views
- in file statistics_controller.php, variable StatisticsController::$views
Only outputs JSON data so don't need view
- $views
- in file machine_controller.php, variable MachineController::$views
Only outputs JSON data so don't need view
- vByteDecode
- in file utility.php, function vByteDecode()
Decodes from a string using variable byte coding an integer.
- vByteEncode
- in file utility.php, function vByteEncode()
Encodes an integer using variable byte coding.
- Vertex
- in file recipe_plugin.php, class Vertex
class to define vertex
- VideourlHelper
- in file videourl_helper.php, class VideourlHelper
Helper used to draw thumbnails for video sites
- VIDEO_SOURCES
- in file crawl_constants.php, class constant CrawlConstants::VIDEO_SOURCES
- View
- in file view.php, class View
Base View Class. A View is used to display the output of controller activity
- viewLinksByYearMonth
- in file search_controller.php, method SearchController::viewLinksByYearMonth()
Display links based on selected year and month in History UI
- videourl_helper.php
- procedural page videourl_helper.php
- view.php
- procedural page view.php
- visited
- in file recipe_plugin.php, method Vertex::visited()
top
w
- $waiting_hosts
- in file queue_server.php, variable QueueServer::$waiting_hosts
This is a list of hosts whose robots.txt file had a Crawl-delay directive
- $web_archive
- in file fetcher.php, variable Fetcher::$web_archive
WebArchiveBundle used to store complete web pages and auxiliary data
- $web_queue
- in file queue_server.php, variable QueueServer::$web_queue
Holds the WebQueueBundle for the crawl. This bundle encapsulates
- $weight
- in file intersect_iterator.php, variable IntersectIterator::$weight
A weighting factor to multiply with each doc SCORE returned from this
- $weight_size
- in file priority_queue.php, variable PriorityQueue::$weight_size
Number of bytes needed to store a weight in the queue
- $words
- in file index_shard.php, variable IndexShard::$words
Stores the array of word entries for this shard
- $words_len
- in file index_shard.php, variable IndexShard::$words_len
Stores length of the words array in the shard on disk. Only set if we're in $read_only_from_disk mode
- $word_docs
- in file index_shard.php, variable IndexShard::$word_docs
This string is non-empty when shard is loaded and in its packed state.
- $word_docs_len
- in file index_shard.php, variable IndexShard::$word_docs_len
Length of $word_docs as a string
- $word_docs_packed
- in file index_shard.php, variable IndexShard::$word_docs_packed
Keeps track of the packed/unpacked state of the word_docs list
- $word_iterator_map
- in file intersect_iterator.php, variable IntersectIterator::$word_iterator_map
An array holding iterator numbers corresponding to the word key
- $word_key
- in file word_iterator.php, variable WordIterator::$word_key
hash of word that the iterator iterates over
- $word_postings
- in file index_shard.php, variable IndexShard::$word_postings
Used to hold word_id, posting_len, posting triples as a memory efficient
- $write_partition
- in file web_archive_bundle.php, variable WebArchiveBundle::$write_partition
The index of the partition to which new documents will be added
- $writing_mode
- in file locale_model.php, variable LocaleModel::$writing_mode
Combination of text direction and block progression as a string. Has one
- warc_archive_bundle_iterator.php
- procedural page warc_archive_bundle_iterator.php
- web_archive_bundle_iterator.php
- procedural page web_archive_bundle_iterator.php
- word_iterator.php
- procedural page word_iterator.php
- web_archive.php
- procedural page web_archive.php
- web_archive_bundle.php
- procedural page web_archive_bundle.php
- web_queue_bundle.php
- procedural page web_queue_bundle.php
- web_archive_test.php
- procedural page web_archive_test.php
- web_queue_bundle_test.php
- procedural page web_queue_bundle_test.php
- web_layout.php
- procedural page web_layout.php
- w1256ToUTF8
- in file locale_functions.php, function w1256ToUTF8()
Convert the string $str encoded in Windows-1256 into UTF-8
- WAITING_START_MESSAGE_STATE
- in file crawl_constants.php, class constant CrawlConstants::WAITING_START_MESSAGE_STATE
- WarcArchiveBundleIterator
- in file warc_archive_bundle_iterator.php, class WarcArchiveBundleIterator
Used to iterate through the records of a collection of warc files stored in
- WARC_ID
- in file crawl_constants.php, class constant CrawlConstants::WARC_ID
- WebArchive
- in file web_archive.php, class WebArchive
Code used to manage web archive files
- WebArchiveBundle
- in file web_archive_bundle.php, class WebArchiveBundle
A web archive bundle is a collection of web archives which are managed
- WebArchiveBundleIterator
- in file web_archive_bundle_iterator.php, class WebArchiveBundleIterator
Class used to model iterating documents indexed in an WebArchiveBundle. This would typically be for the purpose of re-indexing these documents.
- WebArchiveTest
- in file web_archive_test.php, class WebArchiveTest
UnitTest for the WebArchive class. A web archive is used to store array-based objects persistently to a file. This class tests storing and retreiving from such an archive.
- webdecode
- in file utility.php, function webdecode()
Decodes a string encoded by webencode
- webencode
- in file utility.php, function webencode()
Encodes a string in a format suitable for post data (mainly, base64, but str_replace data that might mess up post in result)
- WebLayout
- in file web_layout.php, class WebLayout
Layout used for the seek_quarry Website including pages such as search landing page and settings page
- WebQueueBundle
- in file web_queue_bundle.php, class WebQueueBundle
Encapsulates the data structures needed to have a queue of to crawl urls
- WebQueueBundleTest
- in file web_queue_bundle_test.php, class WebQueueBundleTest
UnitTest for the WebQueueBundle class.
- WEB_ACCESS
- in file config.php, constant WEB_ACCESS
- WEB_ARCHIVE_VERSION
- in file web_archive.php, class constant WebArchive::WEB_ARCHIVE_VERSION
Version number to use in the WebArchive header if constructing a new
- WEB_CRAWL
- in file crawl_constants.php, class constant CrawlConstants::WEB_CRAWL
- weight
- in file mediawiki_bundle_iterator.php, method MediaWikiArchiveBundleIterator::weight()
Estimates the important of the site according to the weighting of
- weight
- in file database_bundle_iterator.php, method DatabaseBundleIterator::weight()
Estimates the important of the site according to the weighting of
- WEIGHT
- in file crawl_constants.php, class constant CrawlConstants::WEIGHT
- weight
- in file mix_archive_bundle_iterator.php, method MixArchiveBundleIterator::weight()
Estimates the importance of the site according to the weighting of
- weight
- in file web_archive_bundle_iterator.php, method WebArchiveBundleIterator::weight()
Estimates the importance of the site according to the weighting of
- weight
- in file odp_rdf_bundle_iterator.php, method OdpRdfArchiveBundleIterator::weight()
Estimates the important of the site according to the weighting of
- weight
- in file text_archive_bundle_iterator.php, method TextArchiveBundleIterator::weight()
Estimates the important of the site according to the weighting of
- weight
- in file archive_bundle_iterator.php, method ArchiveBundleIterator::weight()
Estimates the important of the site according to the weighting of
- weightedCount
- in file index_shard.php, method IndexShard::weightedCount()
Used to sum over the occurences in a position list counting with weight based on term location in the document
- WIKI_DUMP_REDIRECT
- in file nword_grams.php, class constant NWordGrams::WIKI_DUMP_REDIRECT
- WIKI_DUMP_TITLE
- in file nword_grams.php, class constant NWordGrams::WIKI_DUMP_TITLE
- WIKI_PAGE_STYLES
- in file mediawiki_bundle_iterator.php, constant WIKI_PAGE_STYLES
Used to define the styles we put on cache wiki pages
- withinQuota
- in file queue_server.php, method QueueServer::withinQuota()
Checks if the $url is from a site which has an hourly quota to download.
- WordIterator
- in file word_iterator.php, class WordIterator
Used to iterate through the documents associated with a word in an IndexArchiveBundle. It also makes it easy to get the summaries of these documents.
- WORD_ITEM_LEN
- in file index_shard.php, class constant IndexShard::WORD_ITEM_LEN
Length of a Word entry in bytes in the shard
- WORD_KEY_LEN
- in file index_shard.php, class constant IndexShard::WORD_KEY_LEN
Length of a word entry's key in bytes
- WORD_POSTING_COPY_LEN
- in file index_shard.php, class constant IndexShard::WORD_POSTING_COPY_LEN
Bytes of tmp string allowed during flattenings
- WORD_SUGGEST
- in file config.php, constant WORD_SUGGEST
- workDirectory
- in file configure_tool.php, method ConfigureTool::workDirectory()
Used to create/change the location of this Yioop instances work
- WORK_DIRECTORY
- in file config.php, constant WORK_DIRECTORY
- writeAdminMessage
- in file queue_server.php, method QueueServer::writeAdminMessage()
Used to write an admin crawl status message during a start or stop crawl.
- writeArchiveCrawlInfo
- in file queue_server.php, method QueueServer::writeArchiveCrawlInfo()
Used to write info about the current recrawl to file as well as to
- writeCrawlStatus
- in file queue_server.php, method QueueServer::writeCrawlStatus()
Writes status information about the current crawl so that the webserver app can use it for its display.
- writeInfoBlock
- in file web_archive.php, method WebArchive::writeInfoBlock()
Serializes and applies the compressor to an info block and write it at
top
x
- xlsx_processor.php
- procedural page xlsx_processor.php
- xml_processor.php
- procedural page xml_processor.php
- xlsx_processor_test.php
- procedural page xlsx_processor_test.php
- XlsxProcessor
- in file xlsx_processor.php, class XlsxProcessor
Used to create crawl summary information for xlsx files
- XlsxProcessorTest
- in file xlsx_processor_test.php, class XlsxProcessorTest
Used to test that the XlsxProcessor class provides the basic functionality of getting the tile, description, languages and links
- XmlProcessor
- in file xml_processor.php, class XmlProcessor
Used to create crawl summary information for XML files (those served as text/xml)
- xmlToObject
- in file epub_processor.php, method EpubProcessor::xmlToObject()
Used to extract the DOM tree containing the information about the epub file such as title, author, language, unique identifier of the book from a string consisting of ebook publication content OPF file.
top
top
_
- __construct
- in file recipe_plugin.php, method Edge::__construct()
- __construct
- in file queue_server.php, method QueueServer::__construct()
holds the post processors selected in the crawl options page
- __construct
- in file query_tool.php, method QueryTool::__construct()
Initializes the QueryTool, for now does nothing
- __construct
- in file recipe_plugin.php, method Queue::__construct()
- __construct
- in file recipe_plugin.php, method Tree::__construct()
- __construct
- in file searchfilters_model.php, method SearchfiltersModel::__construct()
- __construct
- in file role_model.php, method RoleModel::__construct()
- __construct
- in file recipe_plugin.php, method Vertex::__construct()
- __construct
- in file profile_model.php, method ProfileModel::__construct()
- __construct
- in file priority_queue.php, method PriorityQueue::__construct()
Makes a priority queue (implemented as an array heap) with the given operating parameters
- __construct
- in file page_rule_parser.php, method PageRuleParser::__construct()
Constructs a PageRuleParser using the supplied page_rules
- __construct
- in file page_processor.php, method PageProcessor::__construct()
Set-ups the any indexing plugins associated with this page processor
- __construct
- in file odp_rdf_bundle_iterator.php, method OdpRdfArchiveBundleIterator::__construct()
Creates an open directory rdf archive iterator with the given parameters.
- __construct
- in file parallel_model.php, method ParallelModel::__construct()
- __construct
- in file pdo_manager.php, method PdoManager::__construct()
- __construct
- in file phrase_model.php, method PhraseModel::__construct()
- __construct
- in file persistent_structure.php, method PersistentStructure::__construct()
Sets up the file name and save frequency for the PersistentStructure, initializes the oepration count
- __construct
- in file signin_model.php, method SigninModel::__construct()
- __construct
- in file source_model.php, method SourceModel::__construct()
Just calls the parent class constructor
- __construct
- in file web_archive_bundle.php, method WebArchiveBundle::__construct()
Makes or initializes an existing WebArchiveBundle with the given characteristics
- __construct
- in file web_archive.php, method WebArchive::__construct()
Makes or initializes a WebArchive object using the supplied parameters
- __construct
- in file warc_archive_bundle_iterator.php, method WarcArchiveBundleIterator::__construct()
Creates an warc archive iterator with the given parameters.
- __construct
- in file web_archive_bundle_iterator.php, method WebArchiveBundleIterator::__construct()
Creates a web archive iterator with the given parameters.
- __construct
- in file web_queue_bundle.php, method WebQueueBundle::__construct()
Makes a WebQueueBundle with the provided parameters
- __construct
- in file word_iterator.php, method WordIterator::__construct()
Creates a word iterator with the given parameters.
- __construct
- in file web_queue_bundle_test.php, method WebQueueBundleTest::__construct()
Sets up a miminal DBMS manager class so that we will be able to use
- __construct
- in file view.php, method View::__construct()
The constructor reads in any Element and Helper subclasses which are needed to draw the view. It also reads in the Layout subclass on which the View will be drawn.
- __construct
- in file user_model.php, method UserModel::__construct()
Just calls the parent class constructor
- __construct
- in file string_array.php, method StringArray::__construct()
Initiliazes the fields of the StringArray and its parent class PersistentStructure. Creates a null filled string array of size $this->string_array_size to stored data in.
- __construct
- in file sqlite_manager.php, method SqliteManager::__construct()
- __construct
- in file sqlite3_manager.php, method Sqlite3Manager::__construct()
- __construct
- in file text_archive_bundle_iterator.php, method TextArchiveBundleIterator::__construct()
Creates an text archive iterator with the given parameters.
- __construct
- in file trie.php, method Trie::__construct()
Creates and returnes an empty trie. Sets the end of term character
- __construct
- in file unit_test.php, method UnitTest::__construct()
Contructor should be overriden to do any set up that occurs before
- __construct
- in file union_iterator.php, method UnionIterator::__construct()
Creates a union iterator with the given parameters.
- __construct
- in file non_compressor.php, method NonCompressor::__construct()
Constructor does nothing
- __construct
- in file news_updater.php, method NewsUpdater::__construct()
Sets up the field variables so that newsupdating can begin
- __construct
- in file doc_iterator.php, method DocIterator::__construct()
Creates a word iterator with the given parameters.
- __construct
- in file datasource_manager.php, method DatasourceManager::__construct()
Sets up the query_log for query statistics
- __construct
- in file database_bundle_iterator.php, method DatabaseBundleIterator::__construct()
Creates an database archive iterator with the given parameters. This kind of iterator is used to cycle through the results of a SQL query to a database, so that the results might be indexed by Yioop.
- __construct
- in file element.php, method Element::__construct()
constructor stores a reference to the view this element will reside on
- __construct
- in file fetcher.php, method Fetcher::__construct()
Sets up the field variables so that crawling can begin
- __construct
- in file group_iterator.php, method GroupIterator::__construct()
Creates a group iterator with the given parameters.
- __construct
- in file file_cache.php, method FileCache::__construct()
Creates the directory for the file cache, sets how frequently all items in the cache expire
- __construct
- in file cron_model.php, method CronModel::__construct()
- __construct
- in file crawl_model.php, method CrawlModel::__construct()
- __construct
- in file bloom_filter_bundle.php, method BloomFilterBundle::__construct()
Creates or loads if already exists the directory structure and BloomFilterFiles used by this bundle
- __construct
- in file arc_tool.php, method ArcTool::__construct()
Initializes the ArcTool, for now does nothing
- __construct
- in file arc_archive_bundle_iterator.php, method ArcArchiveBundleIterator::__construct()
Creates an arc archive iterator with the given parameters.
- __construct
- in file bloom_filter_file.php, method BloomFilterFile::__construct()
Initializes the fields of the BloomFilter and its base PersistentStructure.
- __construct
- in file bzip2_block_iterator.php, method BZip2BlockIterator::__construct()
Creates a new iterator of a bz2 file by opening the file, doing a
- __construct
- in file controller.php, method Controller::__construct()
- __construct
- in file configure_tool.php, method ConfigureTool::__construct()
To change configuration parameters of Yioop, this program invokes AdminController methods. These methods expect, data passed to them in super globals set up as a result of an HTTP request. This program fakes the settings of these variables.
- __construct
- in file gzip_compressor.php, method GzipCompressor::__construct()
Constructor does nothing
- __construct
- in file hash_table.php, method HashTable::__construct()
Makes a persistently stored (i.e., on disk and ram) hash table using the supplied parameters
- __construct
- in file mix_archive_bundle_iterator.php, method MixArchiveBundleIterator::__construct()
Creates a web archive iterator with the given parameters.
- __construct
- in file mirror.php, method Mirror::__construct()
Sets up the field variables so that syncing can begin
- __construct
- in file mediawiki_bundle_iterator.php, method MediaWikiArchiveBundleIterator::__construct()
Creates a media wiki archive iterator with the given parameters.
- __construct
- in file model.php, method Model::__construct()
Sets up the database manager that will be used and name of the search engine database
- __construct
- in file mysql_manager.php, method MysqlManager::__construct()
- __construct
- in file network_iterator.php, method NetworkIterator::__construct()
Creates a network iterator with the given parameters.
- __construct
- in file negation_iterator.php, method NegationIterator::__construct()
Creates a negation iterator with the given parameters.
- __construct
- in file machine_model.php, method MachineModel::__construct()
- __construct
- in file locale_model.php, method LocaleModel::__construct()
- __construct
- in file index_archive_bundle.php, method IndexArchiveBundle::__construct()
Makes or initializes an IndexArchiveBundle with the provided parameters
- __construct
- in file indexing_plugin.php, method IndexingPlugin::__construct()
Builds an IndexingPlugin object. Loads in the appropriate
- __construct
- in file helper.php, method Helper::__construct()
The constructor at this point does nothing
- __construct
- in file index_dictionary.php, method IndexDictionary::__construct()
Makes an index dictionary with the given name
- __construct
- in file index_shard.php, method IndexShard::__construct()
Makes an index shard with the given file name and generation offset
- __construct
- in file intersect_iterator.php, method IntersectIterator::__construct()
Creates an intersect iterator with the given parameters.
- __construct
- in file activity_model.php, method ActivityModel::__construct()
- __construct
- in file layout.php, method Layout::__construct()
The constructor sets the view that will be drawn inside the Layout.
- __wakeup
- in file bzip2_block_iterator.php, method BZip2BlockIterator::__wakeup()
Called by unserialize prior to execution
top
|
|