\seekquarry\yioop\library\index_bundle_iteratorsWordIterator

Used to iterate through the documents associated with a word in an IndexArchiveBundle. It also makes it easy to get the summaries of these documents.

A description of how words and the documents containing them are stored is given in the documentation of IndexArchiveBundle.

Summary

Methods
Properties
Constants
reset()
advance()
currentGenDocOffsetWithWord()
findDocsWithWord()
plan()
genDocOffsetCmp()
getDirection()
currentDocsWithWord()
getCurrentDocsForKeys()
nextDocsWithWord()
advanceSeenDocs()
setResultsPerBlock()
__construct()
plainAdvance()
advanceGeneration()
$num_docs
$seen_docs
$count_block
$pages
$current_block_fresh
$results_per_block
$word_key
$base64_word_key
$is_meta
$index_name
$start_generation
$no_more_generations
$next_offset
$dictionary_info
$num_generations
$generation_pointer
$current_generation
$current_offset
$start_offset
$last_offset
$empty
$filter
$current_doc_offset
RESULTS_PER_BLOCK
HOST_KEY_POS
KEY_LEN
No protected methods found
No protected properties found
N/A
No private methods found
No private properties found
N/A

Constants

RESULTS_PER_BLOCK

RESULTS_PER_BLOCK

Default number of documents returned for each block (at most)

HOST_KEY_POS

HOST_KEY_POS

Host Key position + 1 (first char says doc, inlink or eternal link)

KEY_LEN

KEY_LEN

Length of a doc key

Properties

$num_docs

$num_docs : integer

Estimate of the number of documents that this iterator can return

Type

integer

$seen_docs

$seen_docs : integer

The number of documents already iterated over

Type

integer

$count_block

$count_block : integer

The number of documents in the current block

Type

integer

$pages

$pages : array

Cache of what currentDocsWithWord returns

Type

array

$current_block_fresh

$current_block_fresh : boolean

Says whether the value in $this->count_block is up to date

Type

boolean

$results_per_block

$results_per_block : integer

Number of documents returned for each block (at most)

Type

integer

$word_key

$word_key : string

hash of word or phrase that the iterator iterates over

Type

string

$base64_word_key

$base64_word_key : string

Word key above in our modified base 64 encoding

Type

string

$is_meta

$is_meta : string

Whether word key corresponds to a meta word

Type

string

$index_name

$index_name : string

The timestamp of the index is associated with this iterator

Type

string

$start_generation

$start_generation : integer

First shard generation that word info was obtained for

Type

integer

$no_more_generations

$no_more_generations : boolean

Used to keep track of whether getWordInfo might still get more data on the search terms as advance generations

Type

boolean

$next_offset

$next_offset : integer

The next byte offset in the IndexShard

Type

integer

$dictionary_info

$dictionary_info : array

An array of shard generation and posting list offsets, lengths, and numbers of documents

Type

array

$num_generations

$num_generations : integer

The total number of shards that have data for this word

Type

integer

$generation_pointer

$generation_pointer : integer

Index into dictionary_info corresponding to the current shard

Type

integer

$current_generation

$current_generation : integer

Numeric number of current shard

Type

integer

$current_offset

$current_offset : integer

The current byte offset in the IndexShard

Type

integer

$start_offset

$start_offset : integer

Starting Offset of word occurrence in the IndexShard

Type

integer

$last_offset

$last_offset : integer

Last Offset of word occurrence in the IndexShard

Type

integer

$empty

$empty : integer

Keeps track of whether the word_iterator list is empty because the word does not appear in the index shard

Type

integer

$filter

$filter : \seekquarry\yioop\library\index_bundle_iterators\SearchfiltersModel

Model responsible for keeping track of edited and deleted search results

Type

\seekquarry\yioop\library\index_bundle_iterators\SearchfiltersModel

$current_doc_offset

$current_doc_offset : integer

The current value of the doc_offset of current posting if known

Type

integer

Methods

reset()

reset() 

Resets the iterator to the first document block that it could iterate over

advance()

advance(array  $gen_doc_offset = null) 

Forwards the iterator one group of docs

Parameters

array $gen_doc_offset

a generation, doc_offset pair. If set, the must be of greater than or equal generation, and if equal the next block must all have $doc_offsets larger than or equal to this value

currentGenDocOffsetWithWord()

currentGenDocOffsetWithWord() : mixed

Gets the doc_offset and generation for the next document that would be return by this iterator

Returns

mixed —

an array with the desired document offset and generation; -1 on fail

findDocsWithWord()

findDocsWithWord() : mixed

Hook function used by currentDocsWithWord to return the current block of docs if it is not cached

Returns

mixed —

doc ids and score if there are docs left, -1 otherwise

plan()

plan() : string

Returns a string representation of a plan by which the current iterator finds its results

Returns

string —

a representation of the current iterator and its subiterators, useful for determining how a query will be processed

genDocOffsetCmp()

genDocOffsetCmp(array  $gen_doc1, array  $gen_doc2, integer  $direction = self::ASCENDING) : integer

Compares two arrays each containing a (generation, offset) pair.

Parameters

array $gen_doc1

first ordered pair

array $gen_doc2

second ordered pair

integer $direction

whether the comparison should be done for a self::ASCEDNING or a self::DESCENDING search

Returns

integer —

-1,0,1 depending on which is bigger

getDirection()

getDirection() : integer

Returns CrawlConstants::ASCENDING or CrawlConstants::DESCENDING depending on the direction in which this iterator ttraverse the underlying index archive bundle.

Returns

integer —

direction traversing underlying archive bundle

currentDocsWithWord()

currentDocsWithWord() : mixed

Gets the current block of doc ids and score associated with the this iterators word

Returns

mixed —

doc ids and score if there are docs left, -1 otherwise

getCurrentDocsForKeys()

getCurrentDocsForKeys(array  $keys = null) : array

Gets the summaries associated with the keys provided the keys can be found in the current block of docs returned by this iterator

Parameters

array $keys

keys to try to find in the current block of returned results

Returns

array —

doc summaries that match provided keys

nextDocsWithWord()

nextDocsWithWord(  $doc_offset = null) : array

Get the current block of doc summaries for the word iterator and advances the current pointer to the next block of documents. If a doc index is the next block must be of docs after this doc_index

Parameters

$doc_offset

if set the next block must all have $doc_offsets equal to or larger than this value

Returns

array —

doc summaries matching the $this->restrict_phrases

advanceSeenDocs()

advanceSeenDocs() 

Updates the seen_docs count during an advance() call

setResultsPerBlock()

setResultsPerBlock(integer  $num) 

Sets the value of the result_per_block field. This field controls the maximum number of results that can be returned in one go by currentDocsWithWord()

Parameters

integer $num

the maximum number of results that can be returned by a block

__construct()

__construct(string  $word_key, string  $index_name, boolean  $raw = false, \seekquarry\yioop\library\index_bundle_iterators\SearchfiltersModel  $filter = null, integer  $results_per_block = \seekquarry\yioop\library\index_bundle_iterators\IndexBundleIterator::RESULTS_PER_BLOCK, integer  $direction = self::ASCENDING) 

Creates a word iterator with the given parameters.

Parameters

string $word_key

hash of word or phrase to iterate docs of

string $index_name

time_stamp of the to use

boolean $raw

whether the $word_key is our variant of base64 encoded

\seekquarry\yioop\library\index_bundle_iterators\SearchfiltersModel $filter

Model responsible for keeping track of edited and deleted search results

integer $results_per_block

the maximum number of results that can be returned by a findDocsWithWord call

integer $direction

when results are access from $index_name in which order they should be presented. self::ASCENDING is from first added to last added, self::DESCENDING is from last added to first added. Note: this value is not saved permanently. So you could in theory open two read only versions of the same bundle but reading the results in different directions

plainAdvance()

plainAdvance() 

Forwards the iterator one group of docs. This is what's called by @see advance($gen_doc_offset) if $gen_doc_offset is null

advanceGeneration()

advanceGeneration(integer  $generation = null) 

Switches which index shard is being used to return occurrences of the word to the next shard containing the word

Parameters

integer $generation

generation to advance beyond