Constants

RESULTS_PER_BLOCK

RESULTS_PER_BLOCK

Default number of documents returned for each block (at most)

SYNC_TIMEOUT

SYNC_TIMEOUT

Number of seconds before timeout and stop syncGenDocOffsetsAmongstIterators if slow

Properties

$num_docs

$num_docs : integer

Estimate of the number of documents that this iterator can return

Type

integer

$seen_docs

$seen_docs : integer

The number of documents already iterated over

Type

integer

$count_block

$count_block : integer

The number of documents in the current block

Type

integer

$pages

$pages : array

Cache of what currentDocsWithWord returns

Type

array

$current_block_fresh

$current_block_fresh : boolean

Says whether the value in $this->count_block is up to date

Type

boolean

$results_per_block

$results_per_block : integer

Number of documents returned for each block (at most)

Type

integer

$index_bundle_iterators

$index_bundle_iterators : array

An array of iterators whose intersection we get documents from

Type

array

$num_iterators

$num_iterators : integer

Number of elements in $this->index_bundle_iterators

Type

integer

$seen_docs_unfiltered

$seen_docs_unfiltered : integer

The number of iterated docs before the restriction test

Type

integer

$to_advance_index

$to_advance_index : integer

Index of the iterator amongst those we are intersecting to advance next

Type

integer

$word_iterator_map

$word_iterator_map : array

Associative array (term position in original query => iterator index of an iterator for that term). This is to handle queries where the same term occures multiple times. For example, the rock back "The The"

Type

array

$num_words

$num_words : integer

Number of elements in $this->word_iterator_map

Type

integer

$quote_positions

$quote_positions : array

Each element in this array corresponds to one quoted phrase in the original query. Each element is in turn an array with elements corresponding to a position of term in the orginal query followed its length (a term might involve more than one word so the length could be greater than one). It is also allowed that entries might be of the form *num => * to indicates that an asterisk (a wild card that can match any number of terms) appeared at that place in the query

Type

array

$weight

$weight : float

A weighting factor to multiply with each doc SCORE returned from this iterator

Type

float

$sync_timer_on

$sync_timer_on : boolean

Whether to run a timer that shuts down the intersect iterator if syncGenDocOffsetsAmongstIterators takes longer than the time out period

Type

boolean

$sync_time

$sync_time : integer

Start time for syncGenDocOffsetsAmongstIterators

Type

integer

Methods

reset()

reset() 

Returns the iterators to the first document block that it could iterate over

advance()

advance(array  $gen_doc_offset = null) 

Forwards the iterator one group of docs

Parameters

array $gen_doc_offset

a generation, doc_offset pair. If set, the must be of greater than or equal generation, and if equal the next block must all have $doc_offsets larger than or equal to this value

currentGenDocOffsetWithWord()

currentGenDocOffsetWithWord() : mixed

Gets the doc_offset and generation for the next document that would be return by this iterator

Returns

mixed —

an array with the desired document offset and generation; -1 on fail

findDocsWithWord()

findDocsWithWord() : mixed

Hook function used by currentDocsWithWord to return the current block of docs if it is not cached

Returns

mixed —

doc ids and rank if there are docs left, -1 otherwise

plan()

plan() : string

Returns a string representation of a plan by which the current iterator finds its results

Returns

string —

a representation of the current iterator and its subiterators, useful for determining how a query will be processed

genDocOffsetCmp()

genDocOffsetCmp(array  $gen_doc1, array  $gen_doc2, integer  $direction = self::ASCENDING) : integer

Compares two arrays each containing a (generation, offset) pair.

Parameters

array $gen_doc1

first ordered pair

array $gen_doc2

second ordered pair

integer $direction

whether the comparison should be done for a self::ASCEDNING or a self::DESCENDING search

Returns

integer —

-1,0,1 depending on which is bigger

getDirection()

getDirection() : integer

Returns CrawlConstants::ASCENDING or CrawlConstants::DESCENDING depending on the direction in which this iterator ttraverse the underlying index archive bundle.

Returns

integer —

direction traversing underlying archive bundle

currentDocsWithWord()

currentDocsWithWord() : mixed

Gets the current block of doc ids and score associated with the this iterators word

Returns

mixed —

doc ids and score if there are docs left, -1 otherwise

getCurrentDocsForKeys()

getCurrentDocsForKeys(array  $keys = null) : array

Gets the summaries associated with the keys provided the keys can be found in the current block of docs returned by this iterator

Parameters

array $keys

keys to try to find in the current block of returned results

Returns

array —

doc summaries that match provided keys

nextDocsWithWord()

nextDocsWithWord(  $doc_offset = null) : array

Get the current block of doc summaries for the word iterator and advances the current pointer to the next block of documents. If a doc index is the next block must be of docs after this doc_index

Parameters

$doc_offset

if set the next block must all have $doc_offsets equal to or larger than this value

Returns

array —

doc summaries matching the $this->restrict_phrases

advanceSeenDocs()

advanceSeenDocs() 

Updates the seen_docs count during an advance() call

setResultsPerBlock()

setResultsPerBlock(integer  $num) 

This method is supposed to set the value of the result_per_block field. This field controls the maximum number of results that can be returned in one go by currentDocsWithWord(). This method cannot be consistently implemented for this iterator and expect it to behave nicely it this iterator is used together with union_iterator. So to prevent a user for doing this, calling this method results in a user defined error

Parameters

integer $num

the maximum number of results that can be returned by a block

__construct()

__construct(object  $index_bundle_iterators, array  $word_iterator_map, array  $quote_positions = null, float  $weight = 1) 

Creates an intersect iterator with the given parameters.

Parameters

object $index_bundle_iterators

to use as a source of documents to iterate over

array $word_iterator_map

ssociative array ( term position in original query => iterator index of an iterator for that term)

array $quote_positions

Each element in this array corresponds to one quoted phrase in the original query. @see $quote_positions field variable in this class for more info

float $weight

multiplicative factor to apply to scores returned from this iterator

checkQuotes()

checkQuotes(array  $position_lists) : boolean

Used to check if quoted terms in search query appear exactly in the position lists of the current document

Parameters

array $position_lists

of search terms in the current document

Returns

boolean —

whether the quoted terms in the search appear exactly

checkQuote()

checkQuote(array  $position_lists, integer  $cur_pos, mixed  $next_pos,   $qp) : \seekquarry\yioop\library\index_bundle_iterators\-1

Auxiliary function for @see checkQuotes used to check if quoted terms in search query appear exactly in the position lists of the current document

Parameters

array $position_lists

of search terms in the current document

integer $cur_pos

to look after in any position list

mixed $next_pos
  • or int if * next_pos must be >= $cur_pos +len_search_term. $next_pos represents the position the next quoted term should be at
$qp

$position_list_index => $len_of_list_term pairs

Returns

\seekquarry\yioop\library\index_bundle_iterators\-1 —

on failure, 0 on backtrack, 1 on success

computeProximity()

computeProximity(\seekquarry\yioop\library\index_bundle_iterators\array&  $word_position_lists, \seekquarry\yioop\library\index_bundle_iterators\array&  $word_len_lists, boolean  $is_doc) : \seekquarry\yioop\library\index_bundle_iterators\sum

Given the position_lists of a collection of terms computes a score for how close those words were in the given document

Parameters

\seekquarry\yioop\library\index_bundle_iterators\array& $word_position_lists

a 2D array item number => position_list (locations in doc where item occurred) for that item.

\seekquarry\yioop\library\index_bundle_iterators\array& $word_len_lists

length for each item of its position list

boolean $is_doc

whether this is the position list of a document or a link

Returns

\seekquarry\yioop\library\index_bundle_iterators\sum —

of inverse of all covers computed by plane sweep algorithm

syncGenDocOffsetsAmongstIterators()

syncGenDocOffsetsAmongstIterators() 

Finds the next generation and doc offset amongst all the iterators that contains the word. It assumes that the (generation, doc offset) pairs are ordered in an increasing fashion for the underlying iterators