seek_quarry
[ class tree: seek_quarry ] [ index: seek_quarry ] [ all elements ]

Class: WordIterator

Source Location: /lib/index_bundle_iterators/word_iterator.php

Class Overview

IndexBundleIterator
   |
   --WordIterator

Used to iterate through the documents associated with a word in an IndexArchiveBundle. It also makes it easy to get the summaries of these documents.


Author(s):

  • Chris Pollett

Variables

Constants

Methods


Inherited Constants

Inherited Variables

Inherited Methods

Class: IndexBundleIterator

IndexBundleIterator::advance()
Forwards the iterator one group of docs
IndexBundleIterator::advanceSeenDocs()
Updates the seen_docs count during an advance() call
IndexBundleIterator::computeRelevance()
Computes a relevancy score for a posting offset with respect to this
IndexBundleIterator::currentDocsWithWord()
Gets the current block of doc ids and score associated with the this iterators word
IndexBundleIterator::currentGenDocOffsetWithWord()
Gets the doc_offset and generation for the next document that would be return by this iterator
IndexBundleIterator::findDocsWithWord()
Hook function used by currentDocsWithWord to return the current block of docs if it is not cached
IndexBundleIterator::genDocOffsetCmp()
Compares two arrays each containing a (generation, offset) pair.
IndexBundleIterator::getCurrentDocsForKeys()
Gets the summaries associated with the keys provided the keys
IndexBundleIterator::nextDocsWithWord()
Get the current block of doc summaries for the word iterator and advances the current pointer to the next block of documents. If a doc index is the next block must be of docs after this doc_index
IndexBundleIterator::reset()
Returns the iterators to the first document block that it could iterate
IndexBundleIterator::setResultsPerBlock()
Sets the value of the result_per_block field. This field controls the maximum number of results that can be returned in one go by currentDocsWithWord()

Class Details

[line 54]
Used to iterate through the documents associated with a word in an IndexArchiveBundle. It also makes it easy to get the summaries of these documents.

A description of how words and the documents containing them are stored is given in the documentation of IndexArchiveBundle.




Tags:

author:  Chris Pollett
see:  IndexArchiveBundle


[ Top ]


Class Variables

$current_generation =

[line 105]

Numeric number of current shard


Type:   int


[ Top ]

$current_offset =

[line 111]

The current byte offset in the IndexShard


Type:   int


[ Top ]

$dictionary_info =

[line 78]

An array of shard generation and posting list offsets, lengths, and

numbers of documents



Type:   array


[ Top ]

$empty =

[line 130]

Keeps track of whether the word_iterator list is empty because the

word does not appear in the index shard



Type:   int


[ Top ]

$feed_info =

[line 87]


Type:   mixed


[ Top ]

$feed_shard_name =

[line 83]


Type:   mixed


[ Top ]

$filter =

[line 137]

Keeps track of whether the word_iterator list is empty because the

word does not appear in the index shard



Type:   int


[ Top ]

$generation_pointer =

[line 99]

Index into dictionary_info corresponding to the current shard


Type:   int


[ Top ]

$index_name =

[line 65]

The timestamp of the index is associated with this iterator


Type:   string


[ Top ]

$last_offset =

[line 123]

Last Offset of word occurence in the IndexShard


Type:   int


[ Top ]

$next_offset =

[line 71]

The next byte offset in the IndexShard


Type:   int


[ Top ]

$num_generations =

[line 93]

The total number of shards that have data for this word


Type:   int


[ Top ]

$start_offset =

[line 117]

Starting Offset of word occurence in the IndexShard


Type:   int


[ Top ]

$word_key =

[line 60]

hash of word that the iterator iterates over


Type:   string


[ Top ]



Class Methods


constructor __construct [line 156]

WordIterator __construct( string $word_key, string $index_name, [bool $raw = false], [ &$filter = NULL], int $limit, array $filter)

Creates a word iterator with the given parameters.



Parameters:

string   $word_key   hash of word or phrase to iterate docs of
string   $index_name   time_stamp of the to use
int   $limit   the first element to return from the list of docs iterated over
bool   $raw   whether the $word_key is our variant of base64 encoded
array   $filter   an array of hashes of domains to filter from results
   &$filter  

[ Top ]

method advance [line 424]

void advance( [array $gen_doc_offset = null])

Forwards the iterator one group of docs



Overrides IndexBundleIterator::advance() (Forwards the iterator one group of docs)

Parameters:

array   $gen_doc_offset   a generation, doc_offset pair. If set, the must be of greater than or equal generation, and if equal the next block must all have $doc_offsets larger than or equal to this value

[ Top ]

method advanceGeneration [line 478]

void advanceGeneration( [int $generation = null])

Switches which index shard is being used to return occurrences of the word to the next shard containing the word



Parameters:

int   $generation   generation to advance beyond

[ Top ]

method advanceSeenDocs [line 393]

void advanceSeenDocs( )

Updates the seen_docs count during an advance() call



Overrides IndexBundleIterator::advanceSeenDocs() (Updates the seen_docs count during an advance() call)

[ Top ]

method computeRelevance [line 245]

float computeRelevance( int $generation, int $posting_offset)

Computes a relevancy score for a posting offset with respect to this

iterator and generation




Tags:

return:  a relevancy score based on BM25F.


Overrides IndexBundleIterator::computeRelevance() (Computes a relevancy score for a posting offset with respect to this)

Parameters:

int   $generation   the generation the posting offset is for
int   $posting_offset   an offset into word_docs to compute the relevance of

[ Top ]

method currentGenDocOffsetWithWord [line 510]

mixed currentGenDocOffsetWithWord( )

Gets the doc_offset and generation for the next document that would be return by this iterator



Tags:

return:  an array with the desired document offset and generation; -1 on fail


Overrides IndexBundleIterator::currentGenDocOffsetWithWord() (Gets the doc_offset and generation for the next document that would be return by this iterator)

[ Top ]

method findDocsWithWord [line 314]

mixed findDocsWithWord( )

Hook function used by currentDocsWithWord to return the current block of docs if it is not cached



Tags:

return:  doc ids and score if there are docs left, -1 otherwise


Overrides IndexBundleIterator::findDocsWithWord() (Hook function used by currentDocsWithWord to return the current block of docs if it is not cached)

[ Top ]

method reset [line 279]

void reset( )

Returns the iterators to the first document block that it could iterate

over




Overrides IndexBundleIterator::reset() (Returns the iterators to the first document block that it could iterate)

[ Top ]


Class Constants

HOST_KEY_POS =  17

[line 140]

Host Key position + 1 (first char says doc, inlink or eternal link)


[ Top ]

KEY_LEN =  8

[line 143]

Length of a doc key


[ Top ]



Documentation generated by phpDocumentor 1.4.3