src/libraryUtility.php

SeekQuarry/Yioop -- Open Source Pure PHP Search Engine, Crawler, and Indexer

Copyright (C) 2009 - 2020 Chris Pollett chris@pollett.org

LICENSE:

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

END LICENSE

A library of string, error reporting, log, hash, time, and conversion functions

Classes

Mod9Constants Mini-class (so not own file) used to hold encode decode info related to Mod9 encoding (as variant of Simplified-9 specify to Yioop).

Functions

addRegexDelimiters()

addRegexDelimiters(string  $expression) : string

Adds delimiters to a regex that may or may not have them

Parameters

string $expression

a regex

Returns

string —

rgex with delimiters if not there

preg_search()

preg_search(string  $pattern, string  $subject, integer  $offset, boolean  $return_match = false) : mixed

search for a pcre pattern in a subject from a given offset, return position of first match if found -1 otherwise.

Parameters

string $pattern

a Perl compatible regular expression

string $subject

to search for pattern in

integer $offset

character offset into $subject to begin searching from

boolean $return_match

whether to return as well what the match was for the pattern

Returns

mixed —

if $return_match is false then the integer position of first match, otherwise, it returns the ordered pair [$pos, $match].

preg_offset_replace()

preg_offset_replace(string  $pattern, string  $replacement, string  $subject, integer  $offset) : string

Replaces a pcre pattern with a replacement in $subject starting from some offset.

Parameters

string $pattern

a Perl compatible regular expression

string $replacement

what to replace the pattern with

string $subject

to search for pattern in

integer $offset

character offset into $subject to begin searching from

Returns

string —

result of the replacements

parse_ini_with_fallback()

parse_ini_with_fallback(string  $file) : array

Yioop replacement for parse_ini_file($name, true) in case parse_ini_file is on the disable_functions list. Name has underscores to match original function. This function checks if parse_ini_file is disabled on not. If not, it just calls parse_ini_file; otherwise, it simulates it enough so that configure.ini files used for string translations can be read.

Parameters

string $file

filename of ini data to parse into an array

Returns

array —

data parse from file

getIniAssignMatch()

getIniAssignMatch(string  $matches) : mixed

Auxiliary function called from parse_ini_with_fallback to extract from the $matches array produced by the former function's preg_match what kind of assignment occurred in the ini file being parsed.

Parameters

string $matches

produced by a preg_match in parse_ini_with_fallback

Returns

mixed —

value of ini file assignment

charCopy()

charCopy(string  $source, \seekquarry\yioop\library\string&  $destination, integer  $start, integer  $length, string  $timeout_msg = "") 

Copies from $source string beginning at position $start, $length many bytes to destination string

Parameters

string $source

string to copy from

\seekquarry\yioop\library\string& $destination

string to copy to

integer $start

starting offset

integer $length

number of bytes to copy

string $timeout_msg

for long copys message to print if taking more than 30 seconds

vByteEncode()

vByteEncode(integer  $pos_int) : string

Encodes an integer using variable byte coding.

Parameters

integer $pos_int

integer to encode

Returns

string —

a string of 1-5 chars depending on how bit $pos_int was

vByteDecode()

vByteDecode(\seekquarry\yioop\library\string&  $str, integer  $offset) : integer

Decodes from a string using variable byte coding an integer.

Parameters

\seekquarry\yioop\library\string& $str

string to use for decoding

integer $offset

byte offset into string when var int stored

Returns

integer —

the decoded integer

packPosting()

packPosting(integer  $doc_index, array  $position_list, boolean  $delta = true) : string

Makes an packed integer string from a docindex and the number of occurrences of a word in the document with that docindex.

Parameters

integer $doc_index

index (i.e., a count of which document it is rather than a byte offset) of a document in the document string

array $position_list

integer positions word occurred in that doc

boolean $delta

if true then stores the position_list as a sequence of differences (a delta list)

Returns

string —

a modified9 (our compression scheme) packed string containing this info.

unpackPosting()

unpackPosting(string  $posting, \seekquarry\yioop\library\int&  $offset, boolean  $dedelta = true) : array

Given a packed integer string, uses the top three bytes to calculate a doc_index of a document in the shard, and uses the low order byte to computer a number of occurrences of a word in that document.

Parameters

string $posting

a string containing a doc index position list pair coded encoded using modified9

\seekquarry\yioop\library\int& $offset

a offset into the string where the modified9 posting is encoded

boolean $dedelta

if true then assumes the list is a sequence of differences (a delta list) and undoes the difference to get the original sequence

Returns

array —

consisting of integer doc_index and a subarray consisting of integer positions of word in doc.

addDocIndexPostings()

addDocIndexPostings(\seekquarry\yioop\library\string&  $postings, integer  $add_offset) : string

This method is used while appending one index shard to another.

Given a string of postings adds $add_offset add to each offset to the document map in each posting.

Parameters

\seekquarry\yioop\library\string& $postings

a string of index shard postings

integer $add_offset

an fixed amount to add to each postings doc map offset

Returns

string —

$new_postings where each doc offset has had $add_offset added to it

deltaList()

deltaList(array  $list) : array

Computes the difference of a list of integers.

i.e., (a1, a2, a3, a4) becomes (a1, a2-a1, a3-a2, a4-a3)

Parameters

array $list

a nondecreasing list of integers

Returns

array —

the corresponding list of differences of adjacent integers

deDeltaList()

deDeltaList(array  $delta_list) : array

Given an array of differences of integers reconstructs the original list. This computes the inverse of the deltaList function

Parameters

array $delta_list

a list of nonegative integers

Returns

array —

a nondecreasing list of integers

encodeModified9()

encodeModified9(array  $list) : string

Encodes a sequence of integers x, such that 1 <= x <= 2<<28-1 as a string. NOTICE x>=1.

The encoded string is a sequence of 4 byte words (packed int's). The high order 2 bits of a given word indicate whether or not to look at the next word. The codes are as follows: 11 start of encoded string, 10 continue four more bytes, 01 end of encoded, and 00 indicates whole sequence encoded in one word.

After the high order 2 bits, the next most significant bits indicate the format of the current word. There are nine possibilities: 00 - 1 28 bit number, 01 - 2 14 bit numbers, 10 - 3 9 bit numbers, 1100 - 4 6 bit numbers, 1101 - 5 5 bit numbers, 1110 6 4 bit numbers, 11110 - 7 3 bit numbers, 111110 - 12 2 bit numbers, 111111 - 24 1 bit numbers.

Parameters

array $list

a list of positive integers satsfying above

Returns

string —

encoded string

packListModified9()

packListModified9(integer  $continue_bits, integer  $cnt, array  $pack_list) : string

Packs the contents of a single word of a sequence being encoded using Modified9.

Parameters

integer $continue_bits

the high order 2 bits of the word

integer $cnt

the number of element that will be packed in this word

array $pack_list

a list of positive integers to pack into word

Returns

string —

encoded 4 byte string

nextPostString()

nextPostString(\seekquarry\yioop\library\string&  $input_string, \seekquarry\yioop\library\int&  $offset) : string

Returns the next complete posting string from $input_string being at offset.

Does not do any decoding.

Parameters

\seekquarry\yioop\library\string& $input_string

a string of postings

\seekquarry\yioop\library\int& $offset

an offset to this string which will be updated after call

Returns

string —

undecoded posting

decodeModified9()

decodeModified9(string  $input_string, \seekquarry\yioop\library\int&  $offset) : array

Decoded a sequence of positive integers from a string that has been encoded using Modified 9

Parameters

string $input_string

string to decode from

\seekquarry\yioop\library\int& $offset

where to string in the string, after decode points to where one was after decoding.

Returns

array —

sequence of positive integers that were decoded

unpackListModified9()

unpackListModified9(string  $encoded_list) : array

Decode a single word with high two bits off according to modified 9

Parameters

string $encoded_list

four byte string to decode

Returns

array —

sequence of integers that results from the decoding.

docIndexModified9()

docIndexModified9(integer  $encoded_list) : integer

Given an int encoding encoding a doc_index followed by a position list using Modified 9, extracts just the doc_index.

Parameters

integer $encoded_list

in the just described format

Returns

integer —

a doc index into an index shard document map.

decodeQueueWeightInfo()

decodeQueueWeightInfo(integer  $weight_info, string  $crawl_order) : array

Used to decode priority queue page weight and crawl depth from an int used to code this information

Parameters

integer $weight_info

coding weight and depth

string $crawl_order

CrawlConstants code for page crawl order if not CrawlConstants::PAGE_IMPORTANCE then only depth info would be stored in priority queue

Returns

array —

order pair [$weight, $depth]

encodeQueueWeightInfo()

encodeQueueWeightInfo(integer  $weight, integer  $depth, string  $crawl_order) : integer

Packs an ordered pair of weight and depth info for a crawl priority url item into a single int.

Parameters

integer $weight

to be encoded

integer $depth

to be encoded

string $crawl_order

CrawlConstants code for page crawl order if not CrawlConstants::PAGE_IMPORTANCE then only depth info would be stored in priority queue

Returns

integer —

single int storing both peiece of information, weight in high order 24 bits, depth in low order 8 bits

adjustWeightCallback()

adjustWeightCallback(integer  $weight_info, integer  $adjustment) : integer

Given two ints encoding ($weight1, $depth1), ($weight2, $depth2) pairs computes an int encoding ($weight1 + $weight2, min($depth1, $depth2))

Parameters

integer $weight_info

coding weight and depth

integer $adjustment

coding an adjustment to weight and depth

Returns

integer —

$weight_info code for result pair

unpackInt()

unpackInt(string  $str) : integer

Unpacks an int from a 4 char string

Parameters

string $str

where to extract int from

Returns

integer —

extracted integer

packInt()

packInt(integer  $my_int) : string

Packs an int into a 4 char string

Parameters

integer $my_int

the integer to pack

Returns

string —

the packed string

unpackFloat()

unpackFloat(string  $str) : float

Unpacks a float from a 4 char string

Parameters

string $str

where to extract int from

Returns

float —

extracted float

packFloat()

packFloat(float  $my_float) : string

Packs an float into a four char string

Parameters

float $my_float

the float to pack

Returns

string —

the packed string

renameSerializedObject()

renameSerializedObject(string  $class_name, string  $object_string) : string

Used to change the namespace of a serialized php object (assumes doesn't have nested subobjects)

Parameters

string $class_name

new fully qualified name with namespace

string $object_string

serialized object

Returns

string —

serialized object with new name

getDomFromString()

getDomFromString(string  $to_parse) : \seekquarry\yioop\library\DOMDocument

Parses a provided string to make a DOM object. First tries to parse using XML and if this fails uses the more robust HTML Dom parser and manipulates the resulting DOM tree to make correspond to original tags for XML that isn't HTML

Parameters

string $to_parse

the string to parse a DOMDocument from

Returns

\seekquarry\yioop\library\DOMDocument —

pased on the provides string

toHexString()

toHexString(string  $str) : string

Converts a string to string where each char has been replaced by its hexadecimal equivalent

Parameters

string $str

what we want rewritten in hex

Returns

string —

the hexified string

toIntString()

toIntString(string  $str) : string

Converts a string to string where each char has been replaced by a Integer equivalent

Parameters

string $str

what we want rewritten in hex

Returns

string —

the hexified string

toBinString()

toBinString(string  $str) : string

Converts a string to string where each char has been replaced by its binary equivalent

Parameters

string $str

what we want rewritten in hex

Returns

string —

the binary string

metricToInt()

metricToInt(string  $metric_num) : integer

Converts a string of the form some int followed by K, M, or G.

into its integer equivalent. For example 4K would become 4000, 16M would become 16000000, and 1G would become 1000000000 Note not using base 2 for K, M, G

Parameters

string $metric_num

metric number to convert

Returns

integer —

number the metric string corresponded to

intToMetric()

intToMetric(integer  $num) : string

Converts a number to a string followed by nothing, K, M, G, T depending on whether number is < 1000, < 10^6, < 10^9, or < 10^(12)

Parameters

integer $num

number to convert

Returns

string —

number the metric string corresponded to

crawlLog()

crawlLog(string  $msg, string  $lname = null, boolean  $check_process_handler = false) 

Logs a message to a logfile or the screen

Parameters

string $msg

message to log

string $lname

name of log file in the LOG_DIR directory, rotated logs will also use this as their basename followed by a number followed by gzipped (since they are gzipped (older versions of Yioop used bzip Some distros don't have bzip but do have gzip. Also gzip was being used elsewhere in Yioop, so to remove the dependency bzip was replaced )).

boolean $check_process_handler

whether or not to call the processHandler to check how long the code has run since the last time processHandler called.

crawlTimeoutLog()

crawlTimeoutLog(mixed  $msg) 

Writes a log message $msg if more than LOG_TIMEOUT time has passed since the last time crawlTimeoutLog was callled. Useful in loops to write a message as progress is made through the loop (but not on every iteration, but say every 30 seconds).

Parameters

mixed $msg

usually a string with what to be printed out after the timeout period. If $msg === true then clears the timout cache

crawlHash()

crawlHash(string  $string, boolean  $raw = false) : string

Computes an 8 byte hash of a string for use in storing documents.

An eight byte hash was chosen so that the odds of collision even for a few billion documents via the birthday problem are still reasonable. If the raw flag is set to false then an 11 byte base64 encoding of the 8 byte hash is returned. The hash is calculated as the xor of the two halves of the 16 byte md5 of the string. (8 bytes takes less storage which is useful for keeping more doc info in memory)

Parameters

string $string

the string to hash

boolean $raw

whether to leave raw or base 64 encode

Returns

string —

the hash of $string

crawlHashWord()

crawlHashWord(string  $string, boolean  $raw = false) : string

Used to create a 20 byte hash of a string (typically a word or phrase with a wikipedia page). Format is 8 byte crawlHash of term (md5 of term two halves XOR'd), followed by a \x00, followed by the first 11 characters from the term. If there are not enough char's to make 20 bytes, then the string is padded with \x00s to 20bytes.

Parameters

string $string

word to hash

boolean $raw

whether to base64Hash the result

Returns

string —

first 8 bytes of md5 of $string concatenated with \x00 to indicate the hash is of a word not a phrase concatenated with the padded to 11 byte $meta_string.

allCrawlHashPaths()

allCrawlHashPaths(string  $string, boolean  $raw = false) : array

Used to compute all hashes for a phrase based on each possible cond_max point. Here cond_max is the location of a substring of a phase which is maximal.

Parameters

string $string

what to find hashes for

boolean $raw

whether to base64 the result

Returns

array —

of hashes with appropriates shifts if needed

crawlHashPath()

crawlHashPath(string  $string, integer  $path_start, boolean  $raw = false) : string

Given a string makes an 20 byte hash path - where first 8 bytes is a hash of the string before path start, last 12 bytes is the path given by splitting on space and separately hashing each element according to the number of elements and the 3bit selector below:

general format: (64 bit lead word hash, 3bit selector, hashes of rest of words) according to: Selector Bits for each remaining word 001 29 32 32 010 29 16 16 16 16 011 29 16 16 8 8 8 8 100 29 16 16 8 8 4 4 4 4 101 29 16 16 8 8 4 4 2 2 2 2 110 29 16 16 8 8 4 4 2 2 1 1 1 1

If $path_start is 0 behaves like crawlHashWord(). The above encoding is typically used to make word_ids for whole phrases, to make word id's for single words, the format is (64 bits for word, 1 byte null, then ignored 11 bytes ).

Parameters

string $string

what to hash

integer $path_start

what to use as the split between 5 byte front hash and the rest

boolean $raw

whether to modified base64 the result

Returns

string —

8 bytes that results from this hash process

compareWordHashes()

compareWordHashes(string  $id1, string  $id2) : integer

Used to compare to ids for index dictionary lookup. ids are a 8 byte crawlHash together with 12 byte non-hash suffix.

Parameters

string $id1

20 byte word id to compare

string $id2

20 byte word id to compare

Returns

integer —

negative if $id1 smaller, positive if bigger, and 0 if same

base64Hash()

base64Hash(string  $string) : string

Converts a crawl hash number to something closer to base64 coded but so doesn't get confused in urls or DBs

Parameters

string $string

a hash to base64 encode

Returns

string —

the encoded hash

unbase64Hash()

unbase64Hash(string  $base64) : string

Decodes a crawl hash number from base64 to raw ASCII

Parameters

string $base64

a hash to decode

Returns

string —

the decoded hash

webencode()

webencode(string  $str) : string

Encodes a string in a format suitable for post data (mainly, base64, but str_replace data that might mess up post in result)

Parameters

string $str

string to encode

Returns

string —

encoded string

webdecode()

webdecode(string  $str) : string

Decodes a string encoded by webencode

Parameters

string $str

string to encode

Returns

string —

encoded string

crawlCrypt()

crawlCrypt(string  $string, integer  $salt = null) : string

The crawlHash function is used to encrypt passwords stored in the database.

It tries to use the best version the Blowfish variant of php's crypt function available on the current system.

Parameters

string $string

the string to encrypt

integer $salt

salt value to be used (needed to verify if a password is valid)

Returns

string —

the crypted string where crypting is done using crawlHash

partitionByHash()

partitionByHash(array  $table, string  $field, integer  $num_partition, integer  $instance, object  $callback = null) : array

Used by a controller to take a table and return those rows in the table that a given queue_server would be responsible for handling

Parameters

array $table

an array of rows of associative arrays which a queue_server might need to process

string $field

column of $table whose values should be used for partitioning

integer $num_partition

number of queue_servers to choose between

integer $instance

the id of the particular server we are interested in

object $callback

function or static method that might be applied to input before deciding the responsible queue_server. For example, if input was a url we might want to get the host before deciding on the queue_server

Returns

array —

the reduced table that the $instance queue_server is responsible for

calculatePartition()

calculatePartition(string  $input, integer  $num_partition, object  $callback = null) : integer

Used by a controller to say which queue_server should receive a given input

Parameters

string $input

can view as a key that might be processes by a queue_server. For example, in some cases input might be a url and we want to determine which queue_server should be responsible for queuing that url

integer $num_partition

number of queue_servers to choose between

object $callback

function or static method that might be applied to input before deciding the responsible queue_server. For example, if the input was a url we might want to get the host before deciding on the queue_server

Returns

integer —

id of server responsible for input

changeInMicrotime()

changeInMicrotime(string  $start, string  $end = null) : float

Measures the change in time in seconds between two timestamps to microsecond precision

Parameters

string $start

starting time with microseconds

string $end

ending time with microseconds, if null use current time

Returns

float —

time difference in seconds

microTimestamp()

microTimestamp() : string

Timestamp of current epoch with microsecond precision useful for situations where time() might cause too many collisions (account creation, etc)

Returns

string —

timestamp to microsecond of time in second since start of current epoch

checkTimeInterval()

checkTimeInterval(string  $start_time, string  $duration, integer  $time = -1) : integer

Checks that a timestamp is within the time interval given by a start time (HH:mm) and a duration

Parameters

string $start_time

string of the form (HH:mm)

string $duration

string containting an int in seconds

integer $time

a Unix timestamp.

Returns

integer —

-1 if the time of day of $time is not within the given interval. Otherwise, the Unix timestamp at which the interval will be over for the same day as $time.

convertPixels()

convertPixels(string  $value) : integer

Converts a CSS unit string into its equivalent in pixels. This is used by @see SvgProcessor.

Parameters

string $value

a number followed by a legal CSS unit

Returns

integer —

a number in pixels

makePath()

makePath(string  $path) : boolean

Creates folders along a filesystem path if they don't exist

Parameters

string $path

a file system path

Returns

boolean —

success or failure

deleteFileOrDir()

deleteFileOrDir(string  $file_or_dir) 

This is a callback function used in the process of recursively deleting a directory

Parameters

string $file_or_dir

the filename or directory name to be deleted

setWorldPermissions()

setWorldPermissions(string  $file) 

This is a callback function used in the process of recursively chmoding to 777 all files in a folder

Parameters

string $file

the filename or directory name to be chmod

fileInfo()

fileInfo(string  $file) : \seekquarry\yioop\library\an

This is a callback function used in the process of recursively calculating an array of file modification times and files sizes for a directorys

Parameters

string $file

a name of a file in the file system

Returns

\seekquarry\yioop\library\an —

array whose single element contain an associative array with the size and modification time of the file

orderCallback()

orderCallback(string  $word_doc_a, string  $word_doc_b, string  $order_field = null) : integer

Callback function used to sort documents by a field

Should be initialized before using in usort with a call like: orderCallback($tmp, $tmp, "field_want");

Parameters

string $word_doc_a

doc id of first document to compare

string $word_doc_b

doc id of second document to compare

string $order_field

which field of these associative arrays to sort by

Returns

integer —

-1 if first doc bigger 1 otherwise

stringOrderCallback()

stringOrderCallback(string  $word_doc_a, string  $word_doc_b, string  $order_field = null) : integer

Callback function used to sort documents by a field where field is assume to be a string

Should be initialized before using in usort with a call like: stringOrderCallback($tmp, $tmp, "field_want");

Parameters

string $word_doc_a

doc id of first document to compare

string $word_doc_b

doc id of second document to compare

string $order_field

which field of these associative arrays to sort by

Returns

integer —

-1 if first doc smaller 1 otherwise

stringROrderCallback()

stringROrderCallback(string  $word_doc_a, string  $word_doc_b, string  $order_field = null) : integer

Callback function used to sort documents by a field where field is assume to be a string

Should be initialized before using in usort with a call like: stringROrderCallback($tmp, $tmp, "field_want");

Parameters

string $word_doc_a

doc id of first document to compare

string $word_doc_b

doc id of second document to compare

string $order_field

which field of these associative arrays to sort by

Returns

integer —

-1 if first doc bigger 1 otherwise

rorderCallback()

rorderCallback(string  $word_doc_a, string  $word_doc_b, string  $order_field = null) : integer

Callback function used to sort documents by a field in reverse order

Should be initialized before using in usort with a call like: rorderCallback($tmp, $tmp, "field_want");

Parameters

string $word_doc_a

doc id of first document to compare

string $word_doc_b

doc id of second document to compare

string $order_field

which field of these associative arrays to sort by

Returns

integer —

1 if first doc bigger -1 otherwise

lessThan()

lessThan(float  $a, float  $b) : integer

Callback to check if $a is less than $b

Used to help sort document results returned in PhraseModel called in IndexArchiveBundle

Parameters

float $a

first value to compare

float $b

second value to compare

Returns

integer —

-1 if $a is less than $b; 1 otherwise

greaterThan()

greaterThan(float  $a, float  $b) : integer

Callback to check if $a is greater than $b

Used to help sort document results returned in PhraseModel called in IndexArchiveBundle

Parameters

float $a

first value to compare

float $b

second value to compare

Returns

integer —

-1 if $a is greater than $b; 1 otherwise

e()

e(string  $text) 

shorthand for echo

Parameters

string $text

string to send to the current output

remoteAddress()

remoteAddress() 

Compute the real remote address of the incoming connection including forwarding

readInput()

readInput() : string

Used to read a line of input from the command-line

Returns

string —

from the command-line

readPassword()

readPassword() : string

Used to read a line of input from the command-line (on unix machines without echoing it)

Returns

string —

from the command-line

readMessage()

readMessage() : string

Used to read a several lines from the terminal up until a last line consisting of just a "."

Returns

string —

from the command-line

mimeType()

mimeType(string  $file_name, boolean  $use_extension = false) : string

Returns the mime type of the provided file name if it can be determined.

Parameters

string $file_name

(name of file including path to figure out mime type for)

boolean $use_extension

whether to just try to guess from the file extension rather than looking at the file

Returns

string —

mime type or unknown if can't be determined

generalIsA()

generalIsA(mixed  $class_1, mixed  $class_2) : boolean

Checks if class_1 is the same as class_2 or has class_2 as a parent Behaves like 3 param version (last param true) of PHP is_a function that came into being with Version 5.3.9.

Parameters

mixed $class_1

object or string class name to see if in class2

mixed $class_2

object or string class name to see if contains class1

Returns

boolean —

equal or contains class

stripAttributes()

stripAttributes(string  $start_tag_contents, array  $safe_attribute_list = array()) : string

Given the contents of a start XML/HMTL tag strips out all the attributes non listed in $safe_attribute_list

Parameters

string $start_tag_contents

the contents of an HTML/XML tag. I.e., if the tag was <tag stuff> then $start_tag_contents could be stuff

array $safe_attribute_list

a list of attributes which should be kept

Returns

string —

containing only safe attributes and their values

parseCsv()

parseCsv(string  $csv_string) : array

Used to parse into a two dimensional array a string that contains CSV data.

Parameters

string $csv_string

string with csv data

Returns

array —

two dimensional array of elements from csv

arraytoCsv()

arraytoCsv(array  $arr) : string

Converts an array of values to a comma separated value formatted string.

Parameters

array $arr

values to convert

Returns

string —

CSV string after conversion

diff()

diff(string  $data1, string  $data2, boolean  $html = false) : string

Computes a Unix-style diff of two strings. That is it only outputs lines which disagree between the two strings. It outputs +line if a line occurs in the second but not first string and -line if a line occurs in the first string but not the second.

Parameters

string $data1

first string to compare

string $data2

second string to compare

boolean $html

whether to output html highlighting

Returns

string —

respresenting info about where $data1 and $data2 don't match

computeLCS()

computeLCS(array  $lines1, array  $lines2, integer  $offset) 

Computes the longest common subsequence of two arrays

Parameters

array $lines1

an array of lines to compute LCS of

array $lines2

an array of lines to compute LCS of

integer $offset

an offset to shift over array addresses in output by

extractLCSFromTable()

extractLCSFromTable(array  $lcs_moves, array  $lines, integer  $i, integer  $j, integer  $offset, \seekquarry\yioop\library\array&  $lcs) 

Extracts from a table of longest common sequence moves (probably calculated by @see computeLCS) and a starting coordinate $i, $j in that table, a longest common subsequence

Parameters

array $lcs_moves

a table of move computed by computeLCS

array $lines

from first of the two arrays computing LCS of

integer $i

a line number in string 1

integer $j

a line number in string 2

integer $offset

a number to add to each line number output into $lcs. This is useful if we have trimmed off the initially common lines from our two strings we are trying to compute the LCS of

\seekquarry\yioop\library\array& $lcs

an array of triples (index_string1, index_string2, line) the indexes indicate the line number in each string, line is the line in common the two strings

tail()

tail(string  $file_name, string  $num_lines) : array

Returns an array of the last $num_lines many lines our of a file

Parameters

string $file_name

name of file to return lines from

string $num_lines

number of lines to retrieve

Returns

array —

retrieved lines

lineFilter()

lineFilter(string  $lines, mixed  $filters) : array

Given an array of lines returns a subarray of those lines containing the filter string or filter array

Parameters

string $lines

to search

mixed $filters

either string to filter lines with or an array of strings (any of which can be present to pass the filter)

Returns

array —

lines containing the string

logLineTimestamp()

logLineTimestamp(string  $line) : integer

Tries to extract a timestamp from a line which is presumed to come from a Yioop log file

Parameters

string $line

to search

Returns

integer —

timestamp of that log entry

isPositiveInteger()

isPositiveInteger(mixed  $input) : boolean

Returns whether an input can be parsed to a positive integer

Parameters

mixed $input

Returns

boolean —

whether $input can be parsed to a positive integer.

garbageCollect()

garbageCollect() : integer

Runs various system garbage collection functions and returns number of bytes freed.

Returns

integer —

number of bytes freed

utf8SafeSaveHtml()

utf8SafeSaveHtml(\seekquarry\yioop\library\DOMDocument  $dom) : string

The dom method saveHTML has a tendency to replace UTF-8, non-ascii characters with html entities. This performs a save avoiding the replacement.

Parameters

\seekquarry\yioop\library\DOMDocument $dom

Returns

string —

output of saving html

utf8WordWrap()

utf8WordWrap(string  $string, integer  $width = 75, string  $break = "\n", boolean  $cut = false) : string

A UTF-8 safe version of PHP's wordwrap function that wraps a string to a given number of characters

Parameters

string $string

the input string

integer $width

the number of characters at which the string will be wrapped

string $break

string used to break a line into two

boolean $cut

whether to always force wrap at $width characters even if word hasn't ended

Returns

string —

the given string wrapped at the specified length