\seekquarry\yioop\libraryBZip2BlockIterator

This class is used to allow one to iterate through a Bzip2 file.

The main advantage of using this class over the built-in bzip is that it can "remember" where it left off between serializations. So can continue where left off between web invocations. This is used in doing archive crawls of wiki dumps to allow the name server picks up where it left off.

Summary

Methods
Properties
Constants
__construct()
__wakeup()
eof()
close()
nextBlock()
packLeft()
$fd
$file_offset
$buffer
$block
$bits
$num_extra_bits
$header_info
MAGIC
BLOCK_HEADER
BLOCK_ENDMARK
BLOCK_LEADER_RE
BLOCK_SIZE
No protected methods found
No protected properties found
N/A
No private methods found
No private properties found
N/A

Constants

MAGIC

MAGIC

String to tell if file is a bz2 file

BLOCK_HEADER

BLOCK_HEADER

String at the start of each bz2 block

BLOCK_ENDMARK

BLOCK_ENDMARK

String at the end of each bz2 block

BLOCK_LEADER_RE

BLOCK_LEADER_RE

Blocks are NOT byte-aligned, so the block header (and endmark) may show up shifted right by 0-8 bits in various places throughout the file. This regular expression matches any of the possible shifts for both the block header and the block endmark.

BLOCK_SIZE

BLOCK_SIZE

How many bytes to read into buffer from bz2 stream in one go

Properties

$fd

$fd : resource

File handle for bz2 file

Type

resource

$file_offset

$file_offset : integer

Byte offset into bz2 file

Type

integer

$buffer

$buffer : string

Since block sizes are not constant used to store sufficiently many bytes so can properly extract next blocks

Type

string

$block

$block : string

Used to build and store a bz2 block from the file stream

Type

string

$bits

$bits : integer

Stores the left over bits of a bz2 block

Type

integer

$num_extra_bits

$num_extra_bits : integer

Store how many left-over bits there are

Type

integer

$header_info

$header_info : array

Lookup table fpr the number of bits by which the magic number for the next block has been shifted right. Second components of sub-arrays say whether block header or endmark

Type

array

Methods

__construct()

__construct(string  $path) 

Creates a new iterator of a bz2 file by opening the file, doing a sanity check and then setting up the initial file_offset to where the data starts

Parameters

string $path

file path of bz2 file

__wakeup()

__wakeup() 

Called by unserialize prior to execution

eof()

eof() : boolean

Checks whether the current Bzip2 file has reached an end of file

Returns

boolean —

eof or not

close()

close() : boolean

Used to close the file associated with this iterator

Returns

boolean —

whether the file close was successful

nextBlock()

nextBlock(boolean  $raw = false) 

Extracts the next bz2 block from the bzip2 file this iterator works on

Parameters

boolean $raw

if false then decompress the recovered block

packLeft()

packLeft(\seekquarry\yioop\library\string&  $block, \seekquarry\yioop\library\int&  $bits, string  $bytes, integer  $num_extra_bits) 

Computes a new bzip2 block portions and bits left over after adding $bytes to the passed $block.

Parameters

\seekquarry\yioop\library\string& $block

the block to add to

\seekquarry\yioop\library\int& $bits

used to hold bits left over

string $bytes

what to add to the bzip block

integer $num_extra_bits

how many extra bits there are