seek_quarry
[ class tree: seek_quarry ] [ index: seek_quarry ] [ all elements ]

Procedural File: utility.php

Source Location: /lib/utility.php



Page Details:

SeekQuarry/Yioop -- Open Source Pure PHP Search Engine, Crawler, and Indexer

Copyright (C) 2009 - 2013 Chris Pollett chris@pollett.org

LICENSE:

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

END LICENSE

A library of string, log, hash, time, and conversion functions




Tags:

author:  Chris Pollett chris@pollett.org
copyright:  2009 - 2013
link:  http://www.seekquarry.com/
filesource:  Source Code for this file
license:  GPL3








addRegexDelimiters [line 43]

string addRegexDelimiters( string $expression)

Adds delimiters to a regex that may or may not have them



Tags:

return:  rgex with delimiters if not there


Parameters

string   $expression   a regex
[ Top ]



base64Hash [line 670]

string base64Hash( string $string)

Converts a crawl hash number to something closer to base64 coded but so doesn't get confused in urls or DBs



Tags:

return:  the encoded hash


Parameters

string   $string   a hash to base64 encode
[ Top ]



calculatePartition [line 796]

int calculatePartition( string $input, int $num_partition, [object $callback = NULL])

Used by a controller to say which queue_server should receive

a given input




Tags:

return:  id of server responsible for input


Parameters

string   $input   can view as a key that might be processes by a queue_server. For example, in some cases input might be a url and we want to determine which queue_server should be responsible for queuing that url
int   $num_partition   number of queue_servers to choose between
object   $callback   function or static method that might be applied to input before deciding the responsible queue_server. For example, if input was a url we might want to get the host before deciding on the queue_server
[ Top ]



changeInMicrotime [line 823]

float changeInMicrotime( string $start, [string $end = NULL])

Measures the change in time in seconds between two timestamps to microsecond precision



Tags:

return:  time difference in seconds


Parameters

string   $start   starting time with microseconds
string   $end   ending time with microseconds, if null use current time
[ Top ]



charCopy [line 65]

void charCopy( string $source, string &$destination, int $start, int $length)

Copies from $source string beginning at position $start, $length many bytes to destination string



Parameters

string   $source   string to copy from
string   &$destination   string to copy to
int   $start   starting offset
int   $length   number of bytes to copy
[ Top ]



convertPixels [line 846]

int convertPixels( string $value)

Converts a CSS unit string into its equivalent in pixels. This is used by @see SvgProcessor.



Tags:

return:  a number in pixels


Parameters

string   $value   a number followed by a legal CSS unit
[ Top ]



crawlCrypt [line 738]

string crawlCrypt( string $string, [int $salt = NULL])

The search engine project's variation on the Unix crypt function using the crawlHash function instead of DES

The crawlHash function is used to encrypt passwords stored in the database




Tags:

return:  the crypted string where crypting is done using crawlHash


Parameters

string   $string   the string to encrypt
int   $salt   salt value to be used (needed to verify if a password is valid)
[ Top ]



crawlHash [line 644]

string crawlHash( string $string, [bool $raw = false])

Computes an 8 byte hash of a string for use in storing documents.

An eight byte hash was chosen so that the odds of collision even for a few billion documents via the birthday problem are still reasonable. If the raw flag is set to false then an 11 byte base64 encoding of the 8 byte hash is returned. The hash is calculated as the xor of the two halves of the 16 byte md5 of the string. (8 bytes takes less storage which is useful for keeping more doc info in memory)




Tags:

return:  the hash of $string


Parameters

string   $string   the string to hash
bool   $raw   whether to leave raw or base 64 encode
[ Top ]



crawlLog [line 587]

void crawlLog( string $msg, [string $lname = NULL])

Logs a message to a logfile or the screen



Parameters

string   $msg   message to log
string   $lname   name of log file in the LOG_DIR directory, rotated logs will also use this as their basename followed by a number followed by bz2 (since they are bzipped).
[ Top ]



decodeModified9 [line 323]

array decodeModified9( $input_string, int &$offset, [bool $exact = false], string $int_string)

Decoded a sequence of positive integers from a string that has been encoded using Modified 9



Tags:

return:  sequence of positive integers that were decoded
see:  encodeModified9()


Parameters

string   $int_string   string to decode from
int   &$offset   where to string in the string, after decode points to where one was after decoding.
bool   $exact   whether the supplied string is exactly one posting
   $input_string  
[ Top ]



deDeltaList [line 212]

array deDeltaList( &$delta_list, array $delta_list)

Given an array of differences of integers reconstructs the original list. This computes the inverse of the deltaList function



Tags:

return:  a nondecreasing list of integers
see:  deltaList()


Parameters

array   $delta_list   a list of nonegative integers
   &$delta_list  
[ Top ]



deleteFileOrDir [line 896]

void deleteFileOrDir( string $file_or_dir)

This is a callback function used in the process of recursively deleting a directory



Tags:



Parameters

string   $file_or_dir   the filename or directory name to be deleted
[ Top ]



deltaList [line 193]

array deltaList( array $list)

Computes the difference of a list of integers.

i.e., (a1, a2, a3, a4) becomes (a1, a2-a1, a3-a2, a4-a3)




Tags:

return:  the corresponding list of differences of adjacent integers


Parameters

array   $list   a nondecreasing list of integers
[ Top ]



docIndexModified9 [line 407]

int docIndexModified9( int $encoded_list)

Given an int encoding encoding a doc_index followed by a position list using Modified 9, extracts just the doc_index.



Tags:

return:  a doc index into an index shard document map.


Parameters

int   $encoded_list   in the just described format
[ Top ]



e [line 1021]

void e( string $text)

shorthand for echo



Parameters

string   $text   string to send to the current output
[ Top ]



encodeModified9 [line 241]

string encodeModified9( array $list)

Encodes a sequence of integers x, such that 1 <= x <= 2<<28-1 as a string.

The encoded string is a sequence of 4 byte words (packed int's). The high order 2 bits of a given word indicate whether or not to look at the next word. The codes are as follows: 11 start of encoded string, 10 continue four more bytes, 01 end of encoded, and 00 indicates whole sequence encoded in one word.

After the high order 2 bits, the next most significant bits indicate the format of the current word. There are nine possibilities: 00 - 1 28 bit number, 01 - 2 14 bit numbers, 10 - 3 9 bit numbers, 1100 - 4 6 bit numbers, 1101 - 5 5 bit numbers, 1110 6 4 bit numbers, 11110 - 7 3 bit numbers, 111110 - 12 2 bit numbers, 111111 - 24 1 bit numbers.




Tags:

return:  encoded string


Parameters

array   $list   a list of positive integers satsfying above
[ Top ]



fileInfo [line 925]

an fileInfo( string $file)

This is a callback function used in the process of recursively calculating an array of file modification times and files sizes for a directorys



Tags:

return:  array whose single element contain an associative array with the size and modification time of the file


Parameters

string   $file   a name of a file in the file system
[ Top ]



general_is_a [line 1079]

void general_is_a( $class_1, $class_2)

Checks if class_1 is the same as class_2 of has class_2 as a parent Behaves like 3 param version (last param true) of PHP is_a function that came into being with Version 5.3.9.



Parameters

   $class_1  
   $class_2  
[ Top ]



greaterThan [line 1009]

int greaterThan( float $a, float $b)

Callback to check if $a is greater than $b

Used to help sort document results returned in PhraseModel called in IndexArchiveBundle




Tags:

return:  -1 if $a is greater than $b; 1 otherwise
see:  PhraseModel::getTopPhrases()
see:  IndexArchiveBundle::getSelectiveWords()


Parameters

float   $a   first value to compare
float   $b   second value to compare
[ Top ]



lessThan [line 990]

int lessThan( float $a, float $b)

Callback to check if $a is less than $b

Used to help sort document results returned in PhraseModel called in IndexArchiveBundle




Tags:

return:  -1 if $a is less than $b; 1 otherwise
see:  PhraseModel::getPhrasePageResults()
see:  IndexArchiveBundle::getSelectiveWords()


Parameters

float   $a   first value to compare
float   $b   second value to compare
[ Top ]



metricToInt [line 557]

int metricToInt( string $metric_num)

Converts a string of the form some int followed by K, M, or G.

into its integer equivalent. For example 4K would become 4000, 16M would become 16000000, and 1G would become 1000000000




Tags:

return:  number the metric string corresponded to


Parameters

string   $metric_num   metric number to convert
[ Top ]



orderCallback [line 947]

int orderCallback( string $word_doc_a, string $word_doc_b, [ $order_field = NULL], string $field)

Callback function used to sort documents by a field

Should be initialized before using in usort with a call like: orderCallback($tmp, $tmp, "field_want");




Tags:

return:  -1 if first doc bigger 1 otherwise


Parameters

string   $word_doc_a   doc id of first document to compare
string   $word_doc_b   doc id of second document to compare
string   $field   which field of these associative arrays to sort by
   $order_field  
[ Top ]



packFloat [line 512]

string packFloat( $my_float, float $my_floatt)

Packs an float into a 4 char string



Tags:

return:  the packed string


Parameters

float   $my_floatt   the float to pack
   $my_float  
[ Top ]



packInt [line 487]

string packInt( int $my_int)

Packs an int into a 4 char string



Tags:

return:  the packed string


Parameters

int   $my_int   the integer to pack
[ Top ]



packListModified9 [line 294]

string packListModified9( int $continue_bits, int $cnt, $pack_list, array $list)

Packs the contents of a single word of a sequence being encoded using Modified9.



Tags:

return:  encoded 4 byte string
see:  encodeModified9()


Parameters

int   $continue_bits   the high order 2 bits of the word
int   $cnt   the number of element that will be packed in this word
array   $list   a list of positive integers to pack into word
   $pack_list  
[ Top ]



packPosting [line 122]

string packPosting( int $doc_index, array $position_list, [bool $delta = true])

Makes an packed integer string from a docindex and the number of occurrences of a word in the document with that docindex.



Tags:

return:  a modified9 (our compression scheme) packed string containing this info.


Parameters

int   $doc_index   index (i.e., a count of which document it is rather than a byte offset) of a document in the document string
bool   $delta   if true then stores the position_list as a sequence of differences (a delta list)
array   $position_list   integer positions word occurred in that doc
[ Top ]



partitionByHash [line 767]

array partitionByHash( array $table, string $field, int $num_partition, int $instance, [object $callback = NULL])

Used by a controller to take a table and return those rows in the table that a given queue_server would be responsible for handling



Tags:

return:  the reduced table that the $instance queue_server is responsible for


Parameters

array   $table   an array of rows of associative arrays which a queue_server might need to process
string   $field   column of $table whose values should be used for partitioning
int   $num_partition   number of queue_servers to choose between
int   $instance   the id of the particular server we are interested in
object   $callback   function or static method that might be applied to input before deciding the responsible queue_server. For example, if input was a url we might want to get the host before deciding on the queue_server
[ Top ]



readInput [line 1030]

string readInput( )

Used to read a line of input from the command-line



Tags:

return:  from the command-line


[ Top ]



readMessage [line 1061]

string readMessage( )

Used to read a several lines from the terminal up until

a last line consisting of just a "."




Tags:

return:  from the command-line


[ Top ]



readPassword [line 1044]

string readPassword( )

Used to read a line of input from the command-line

(on unix machines without echoing it)




Tags:

return:  from the command-line


[ Top ]



rorderCallback [line 968]

int rorderCallback( string $word_doc_a, string $word_doc_b, [ $order_field = NULL], string $field)

Callback function used to sort documents by a field in reverse order

Should be initialized before using in usort with a call like: orderCallback($tmp, $tmp, "field_want");




Tags:

return:  -1 if first doc bigger 1 otherwise


Parameters

string   $word_doc_a   doc id of first document to compare
string   $word_doc_b   doc id of second document to compare
string   $field   which field of these associative arrays to sort by
   $order_field  
[ Top ]



setWorldPermissions [line 912]

void setWorldPermissions( string $file)

This is a callback function used in the process of recursively chmoding to 777 all files in a folder



Tags:

see:  DatasourceManager::etWorldPermissionsRecursive()


Parameters

string   $file   the filename or directory name to be chmod
[ Top ]



toBinString [line 540]

string toBinString( string $str)

Converts a string to string where each char has been replaced by its binary equivalent



Tags:

return:  the binary string


Parameters

string   $str   what we want rewritten in hex
[ Top ]



toHexString [line 524]

string toHexString( string $str)

Converts a string to string where each char has been replaced by its hexadecimal equivalent



Tags:

return:  the hexified string


Parameters

string   $str   what we want rewritten in hex
[ Top ]



unbase64Hash [line 686]

string unbase64Hash( string $base64)

Decodes a crawl hash number from base64 to raw ASCII



Tags:

return:  the decoded hash


Parameters

string   $base64   a hash to decode
[ Top ]



unpackFloat [line 499]

float unpackFloat( string $str)

Unpacks a float from a 4 char string



Tags:

return:  extracted float


Parameters

string   $str   where to extract int from
[ Top ]



unpackInt [line 471]

int unpackInt( string $str)

Unpacks an int from a 4 char string



Tags:

return:  extracted integer


Parameters

string   $str   where to extract int from
[ Top ]



unpackListModified9 [line 357]

array unpackListModified9( $encoded_list, string $int_string)

Decoded a single word with high two bits off according to modified 9



Tags:

return:  sequence of integers that results from the decoding.


Parameters

string   $int_string   4 byte string to decode
   $encoded_list  
[ Top ]



unpackPosting [line 160]

array unpackPosting( string $posting, int &$offset, [bool $dedelta = true], [bool $exact = false])

Given a packed integer string, uses the top three bytes to calculate a doc_index of a document in the shard, and uses the low order byte to computer a number of occurences of a word in that document.



Tags:

return:  consisting of integer doc_index and a subarray consisting of integer positions of word in doc.


Parameters

string   $posting   a string containing a doc index position list pair coded encoded using modified9
bool   $dedelta   if true then assumes the list is a sequence of differences (a delta list) and undoes the difference to get the original sequence
bool   $exact   whether the supplied string is exactly one posting
int   &$offset   &offset a offset into the string where the modified9 posting is encoded
[ Top ]



vByteDecode [line 98]

int vByteDecode( string &$str, &$offset, int $offset)

Decodes from a string using variable byte coding an integer.



Tags:

return:  the decoded integer


Parameters

string   &$str   string to use for decoding
int   $offset   byte offset into string when var int stored
   &$offset  
[ Top ]



vByteEncode [line 80]

string vByteEncode( int $pos_int)

Encodes an integer using variable byte coding.



Tags:

return:  a string of 1-5 chars depending on how bit $pos_int was


Parameters

int   $pos_int   integer to encode
[ Top ]



webdecode [line 719]

string webdecode( string $str)

Decodes a string encoded by webencode



Tags:

return:  encoded string


Parameters

string   $str   string to encode
[ Top ]



webencode [line 704]

string webencode( string $str)

Encodes a string in a format suitable for post data (mainly, base64, but str_replace data that might mess up post in result)



Tags:

return:  encoded string


Parameters

string   $str   string to encode
[ Top ]



Documentation generated by phpDocumentor 1.4.3