Procedural File: utility.php
Source Location: /lib/utility.php
Page Details:
SeekQuarry/Yioop -- Open Source Pure PHP Search Engine, Crawler, and Indexer
Copyright (C) 2009 - 2013 Chris Pollett chris@pollett.org LICENSE: This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. END LICENSE A library of string, log, hash, time, and conversion functions
Tags:
addRegexDelimiters [line 43]
string addRegexDelimiters(
string $expression)
|
|
Adds delimiters to a regex that may or may not have them
Tags:
Parameters
| string |
$expression |
a regex |
base64Hash [line 670]
string base64Hash(
string $string)
|
|
Converts a crawl hash number to something closer to base64 coded but so doesn't get confused in urls or DBs
Tags:
Parameters
| string |
$string |
a hash to base64 encode |
calculatePartition [line 796]
int calculatePartition(
string $input, int $num_partition, [object $callback = NULL])
|
|
Used by a controller to say which queue_server should receive a given input
Tags:
Parameters
| string |
$input |
can view as a key that might be processes by a queue_server. For example, in some cases input might be a url and we want to determine which queue_server should be responsible for queuing that url |
| int |
$num_partition |
number of queue_servers to choose between |
| object |
$callback |
function or static method that might be applied to input before deciding the responsible queue_server. For example, if input was a url we might want to get the host before deciding on the queue_server |
changeInMicrotime [line 823]
float changeInMicrotime(
string $start, [string $end = NULL])
|
|
Measures the change in time in seconds between two timestamps to microsecond precision
Tags:
Parameters
| string |
$start |
starting time with microseconds |
| string |
$end |
ending time with microseconds, if null use current time |
charCopy [line 65]
void charCopy(
string $source, string &$destination, int $start, int $length)
|
|
Copies from $source string beginning at position $start, $length many bytes to destination string
Parameters
| string |
$source |
string to copy from |
| string |
&$destination |
string to copy to |
| int |
$start |
starting offset |
| int |
$length |
number of bytes to copy |
convertPixels [line 846]
int convertPixels(
string $value)
|
|
Converts a CSS unit string into its equivalent in pixels. This is used by @see SvgProcessor.
Tags:
Parameters
| string |
$value |
a number followed by a legal CSS unit |
crawlCrypt [line 738]
string crawlCrypt(
string $string, [int $salt = NULL])
|
|
The search engine project's variation on the Unix crypt function using the crawlHash function instead of DES The crawlHash function is used to encrypt passwords stored in the database
Tags:
Parameters
| string |
$string |
the string to encrypt |
| int |
$salt |
salt value to be used (needed to verify if a password is valid) |
crawlHash [line 644]
string crawlHash(
string $string, [bool $raw = false])
|
|
Computes an 8 byte hash of a string for use in storing documents. An eight byte hash was chosen so that the odds of collision even for a few billion documents via the birthday problem are still reasonable. If the raw flag is set to false then an 11 byte base64 encoding of the 8 byte hash is returned. The hash is calculated as the xor of the two halves of the 16 byte md5 of the string. (8 bytes takes less storage which is useful for keeping more doc info in memory)
Tags:
Parameters
| string |
$string |
the string to hash |
| bool |
$raw |
whether to leave raw or base 64 encode |
crawlLog [line 587]
void crawlLog(
string $msg, [string $lname = NULL])
|
|
Logs a message to a logfile or the screen
Parameters
| string |
$msg |
message to log |
| string |
$lname |
name of log file in the LOG_DIR directory, rotated logs will also use this as their basename followed by a number followed by bz2 (since they are bzipped). |
decodeModified9 [line 323]
array decodeModified9(
$input_string, int &$offset, [bool $exact = false], string $int_string)
|
|
Decoded a sequence of positive integers from a string that has been encoded using Modified 9
Tags:
Parameters
| string |
$int_string |
string to decode from |
| int |
&$offset |
where to string in the string, after decode points to where one was after decoding. |
| bool |
$exact |
whether the supplied string is exactly one posting |
| |
$input_string |
|
deDeltaList [line 212]
array deDeltaList(
&$delta_list, array $delta_list)
|
|
Given an array of differences of integers reconstructs the original list. This computes the inverse of the deltaList function
Tags:
Parameters
| array |
$delta_list |
a list of nonegative integers |
| |
&$delta_list |
|
deleteFileOrDir [line 896]
void deleteFileOrDir(
string $file_or_dir)
|
|
This is a callback function used in the process of recursively deleting a directory
Tags:
Parameters
| string |
$file_or_dir |
the filename or directory name to be deleted |
deltaList [line 193]
array deltaList(
array $list)
|
|
Computes the difference of a list of integers. i.e., (a1, a2, a3, a4) becomes (a1, a2-a1, a3-a2, a4-a3)
Tags:
Parameters
| array |
$list |
a nondecreasing list of integers |
docIndexModified9 [line 407]
int docIndexModified9(
int $encoded_list)
|
|
Given an int encoding encoding a doc_index followed by a position list using Modified 9, extracts just the doc_index.
Tags:
Parameters
| int |
$encoded_list |
in the just described format |
e [line 1021]
shorthand for echo
Parameters
| string |
$text |
string to send to the current output |
encodeModified9 [line 241]
string encodeModified9(
array $list)
|
|
Encodes a sequence of integers x, such that 1 <= x <= 2<<28-1 as a string. The encoded string is a sequence of 4 byte words (packed int's). The high order 2 bits of a given word indicate whether or not to look at the next word. The codes are as follows: 11 start of encoded string, 10 continue four more bytes, 01 end of encoded, and 00 indicates whole sequence encoded in one word. After the high order 2 bits, the next most significant bits indicate the format of the current word. There are nine possibilities: 00 - 1 28 bit number, 01 - 2 14 bit numbers, 10 - 3 9 bit numbers, 1100 - 4 6 bit numbers, 1101 - 5 5 bit numbers, 1110 6 4 bit numbers, 11110 - 7 3 bit numbers, 111110 - 12 2 bit numbers, 111111 - 24 1 bit numbers.
Tags:
Parameters
| array |
$list |
a list of positive integers satsfying above |
fileInfo [line 925]
an fileInfo(
string $file)
|
|
This is a callback function used in the process of recursively calculating an array of file modification times and files sizes for a directorys
Tags:
Parameters
| string |
$file |
a name of a file in the file system |
general_is_a [line 1079]
void general_is_a(
$class_1, $class_2)
|
|
Checks if class_1 is the same as class_2 of has class_2 as a parent Behaves like 3 param version (last param true) of PHP is_a function that came into being with Version 5.3.9.
Parameters
greaterThan [line 1009]
int greaterThan(
float $a, float $b)
|
|
Callback to check if $a is greater than $b Used to help sort document results returned in PhraseModel called in IndexArchiveBundle
Tags:
Parameters
| float |
$a |
first value to compare |
| float |
$b |
second value to compare |
lessThan [line 990]
int lessThan(
float $a, float $b)
|
|
Callback to check if $a is less than $b Used to help sort document results returned in PhraseModel called in IndexArchiveBundle
Tags:
Parameters
| float |
$a |
first value to compare |
| float |
$b |
second value to compare |
metricToInt [line 557]
int metricToInt(
string $metric_num)
|
|
Converts a string of the form some int followed by K, M, or G. into its integer equivalent. For example 4K would become 4000, 16M would become 16000000, and 1G would become 1000000000
Tags:
Parameters
| string |
$metric_num |
metric number to convert |
orderCallback [line 947]
int orderCallback(
string $word_doc_a, string $word_doc_b, [ $order_field = NULL], string $field)
|
|
Callback function used to sort documents by a field Should be initialized before using in usort with a call like: orderCallback($tmp, $tmp, "field_want");
Tags:
Parameters
| string |
$word_doc_a |
doc id of first document to compare |
| string |
$word_doc_b |
doc id of second document to compare |
| string |
$field |
which field of these associative arrays to sort by |
| |
$order_field |
|
packFloat [line 512]
string packFloat(
$my_float, float $my_floatt)
|
|
Packs an float into a 4 char string
Tags:
Parameters
| float |
$my_floatt |
the float to pack |
| |
$my_float |
|
packInt [line 487]
string packInt(
int $my_int)
|
|
Packs an int into a 4 char string
Tags:
Parameters
| int |
$my_int |
the integer to pack |
packListModified9 [line 294]
string packListModified9(
int $continue_bits, int $cnt, $pack_list, array $list)
|
|
Packs the contents of a single word of a sequence being encoded using Modified9.
Tags:
Parameters
| int |
$continue_bits |
the high order 2 bits of the word |
| int |
$cnt |
the number of element that will be packed in this word |
| array |
$list |
a list of positive integers to pack into word |
| |
$pack_list |
|
packPosting [line 122]
string packPosting(
int $doc_index, array $position_list, [bool $delta = true])
|
|
Makes an packed integer string from a docindex and the number of occurrences of a word in the document with that docindex.
Tags:
Parameters
| int |
$doc_index |
index (i.e., a count of which document it is rather than a byte offset) of a document in the document string |
| bool |
$delta |
if true then stores the position_list as a sequence of differences (a delta list) |
| array |
$position_list |
integer positions word occurred in that doc |
partitionByHash [line 767]
array partitionByHash(
array $table, string $field, int $num_partition, int $instance, [object $callback = NULL])
|
|
Used by a controller to take a table and return those rows in the table that a given queue_server would be responsible for handling
Tags:
Parameters
| array |
$table |
an array of rows of associative arrays which a queue_server might need to process |
| string |
$field |
column of $table whose values should be used for partitioning |
| int |
$num_partition |
number of queue_servers to choose between |
| int |
$instance |
the id of the particular server we are interested in |
| object |
$callback |
function or static method that might be applied to input before deciding the responsible queue_server. For example, if input was a url we might want to get the host before deciding on the queue_server |
readInput [line 1030]
Used to read a line of input from the command-line
Tags:
readMessage [line 1061]
Used to read a several lines from the terminal up until a last line consisting of just a "."
Tags:
readPassword [line 1044]
Used to read a line of input from the command-line (on unix machines without echoing it)
Tags:
rorderCallback [line 968]
int rorderCallback(
string $word_doc_a, string $word_doc_b, [ $order_field = NULL], string $field)
|
|
Callback function used to sort documents by a field in reverse order Should be initialized before using in usort with a call like: orderCallback($tmp, $tmp, "field_want");
Tags:
Parameters
| string |
$word_doc_a |
doc id of first document to compare |
| string |
$word_doc_b |
doc id of second document to compare |
| string |
$field |
which field of these associative arrays to sort by |
| |
$order_field |
|
setWorldPermissions [line 912]
void setWorldPermissions(
string $file)
|
|
This is a callback function used in the process of recursively chmoding to 777 all files in a folder
Tags:
Parameters
| string |
$file |
the filename or directory name to be chmod |
toBinString [line 540]
string toBinString(
string $str)
|
|
Converts a string to string where each char has been replaced by its binary equivalent
Tags:
Parameters
| string |
$str |
what we want rewritten in hex |
toHexString [line 524]
string toHexString(
string $str)
|
|
Converts a string to string where each char has been replaced by its hexadecimal equivalent
Tags:
Parameters
| string |
$str |
what we want rewritten in hex |
unbase64Hash [line 686]
string unbase64Hash(
string $base64)
|
|
Decodes a crawl hash number from base64 to raw ASCII
Tags:
Parameters
| string |
$base64 |
a hash to decode |
unpackFloat [line 499]
float unpackFloat(
string $str)
|
|
Unpacks a float from a 4 char string
Tags:
Parameters
| string |
$str |
where to extract int from |
unpackInt [line 471]
int unpackInt(
string $str)
|
|
Unpacks an int from a 4 char string
Tags:
Parameters
| string |
$str |
where to extract int from |
unpackListModified9 [line 357]
array unpackListModified9(
$encoded_list, string $int_string)
|
|
Decoded a single word with high two bits off according to modified 9
Tags:
Parameters
| string |
$int_string |
4 byte string to decode |
| |
$encoded_list |
|
unpackPosting [line 160]
array unpackPosting(
string $posting, int &$offset, [bool $dedelta = true], [bool $exact = false])
|
|
Given a packed integer string, uses the top three bytes to calculate a doc_index of a document in the shard, and uses the low order byte to computer a number of occurences of a word in that document.
Tags:
Parameters
| string |
$posting |
a string containing a doc index position list pair coded encoded using modified9 |
| bool |
$dedelta |
if true then assumes the list is a sequence of differences (a delta list) and undoes the difference to get the original sequence |
| bool |
$exact |
whether the supplied string is exactly one posting |
| int |
&$offset |
&offset a offset into the string where the modified9 posting is encoded |
vByteDecode [line 98]
int vByteDecode(
string &$str, &$offset, int $offset)
|
|
Decodes from a string using variable byte coding an integer.
Tags:
Parameters
| string |
&$str |
string to use for decoding |
| int |
$offset |
byte offset into string when var int stored |
| |
&$offset |
|
vByteEncode [line 80]
string vByteEncode(
int $pos_int)
|
|
Encodes an integer using variable byte coding.
Tags:
Parameters
| int |
$pos_int |
integer to encode |
webdecode [line 719]
string webdecode(
string $str)
|
|
Decodes a string encoded by webencode
Tags:
Parameters
| string |
$str |
string to encode |
webencode [line 704]
string webencode(
string $str)
|
|
Encodes a string in a format suitable for post data (mainly, base64, but str_replace data that might mess up post in result)
Tags:
Parameters
| string |
$str |
string to encode |
|