seek_quarry
[ class tree: seek_quarry ] [ index: seek_quarry ] [ all elements ]

Class: WebArchiveBundle

Source Location: /lib/web_archive_bundle.php

Class Overview


A web archive bundle is a collection of web archives which are managed


Author(s):

  • Chris Pollett

Variables

Methods



Class Details

[line 61]
A web archive bundle is a collection of web archives which are managed

together.It is useful to split data across several archive files rather than just store it in one, for both read efficiency and to keep filesizes from getting too big. In some places we are using 4 byte int's to store file offsets which restricts the size of the files we can use for wbe archives.




Tags:

author:  Chris Pollett


[ Top ]


Class Variables

$compressor =

[line 94]

How Compressor object used to compress/uncompress data stored in

the bundle



Type:   object


[ Top ]

$count =

[line 78]

Total number of page objects stored by this WebArchiveBundle


Type:   int


[ Top ]

$description =

[line 88]

A short text name for this WebArchiveBundle


Type:   string


[ Top ]

$dir_name =

[line 68]

Folder name to use for this WebArchiveBundle


Type:   string


[ Top ]

$partition = array()

[line 73]

Used to contain the WebArchive paritions of the bundle


Type:   array


[ Top ]

$read_only_archive =

[line 99]

Controls whether the archive was opened in read only mode


Type:   bool


[ Top ]

$write_partition =

[line 83]

The index of the partition to which new documents will be added


Type:   int


[ Top ]



Class Methods


static method getArchiveInfo [line 319]

static array getArchiveInfo( string $dir_name)

Gets information about a WebArchiveBundle out of its description.txt file



Tags:

return:  containing the name (description) of the WebArchiveBundle, the number of items stored in it, and the number of WebArchive file partitions it uses.


Parameters:

string   $dir_name   folder name of the WebArchiveBundle to get info for

[ Top ]

static method getParamModifiedTime [line 356]

static void getParamModifiedTime( string $dir_name)

Returns the mast time the archive info of the bundle was modified.



Parameters:

string   $dir_name   folder with archive bundle

[ Top ]

static method setArchiveInfo [line 343]

static void setArchiveInfo( string $dir_name, array $info)

Sets the archive info (DESCRIPTION, COUNT, NUM_DOCS_PER_PARTITION) for this web archive



Parameters:

string   $dir_name   folder with archive bundle
array   $info   struct with above fields

[ Top ]

constructor __construct [line 112]

WebArchiveBundle __construct( string $dir_name, [ $read_only_archive = true], [int $num_docs_per_partition = NUM_DOCS_PER_GENERATION], [string $description = NULL], [string $compressor = "GzipCompressor"])

Makes or initializes an existing WebArchiveBundle with the given characteristics



Parameters:

string   $dir_name   folder name of the bundle
int   $num_docs_per_partition   number of documents before the web archive is changed
string   $description   a short text name/description of this WebArchiveBundle
string   $compressor   the Compressor object used to compress/uncompress data stored in the bundle
   $read_only_archive  

[ Top ]

method addCount [line 298]

void addCount( int $num, [string $field = "COUNT"])

Updates the description file with the current count for the number of items in the WebArchiveBundle. If the $field item is used counts of additional properties (visited urls say versus total urls) can be maintained.



Parameters:

int   $num   number of items to add to current count
string   $field   field of info struct to add to the count of

[ Top ]

method addPages [line 174]

int addPages( string $offset_field, array &$pages)

Add the array of $pages to the WebArchiveBundle pages being stored in the partition according to write partition and the field used to store the resulting offsets given by $offset_field.



Tags:

return:  the write_partition the pages were stored in


Parameters:

string   $offset_field   field used to record offsets after storing
array   &$pages   data to store

[ Top ]

method getPage [line 221]

array getPage( int $offset, int $partition, [resource $file_handle = NULL])

Gets a page using in WebArchive $partition using the provided byte $offset and using existing $file_handle if possible.



Tags:

return:  desired page


Parameters:

int   $offset   byte offset of page data
int   $partition   which WebArchive to look in
resource   $file_handle   file handle resource of $partition archive

[ Top ]

method getPartition [line 244]

object the getPartition( int $index, [bool $fast_construct = true])

Gets an object encapsulating the $index the WebArchive partition in this bundle.



Tags:

return:  WebArchive file which was requested


Parameters:

int   $index   the number of the partition within this bundle to return
bool   $fast_construct   should the constructor of the WebArchive avoid reading in its info block.

[ Top ]

method initCountIfNotExists [line 276]

void initCountIfNotExists( [string $field = "COUNT"])

Creates a new counter to be maintained in the description.txt file if the counter doesn't exist, leaves unchanged otherwise



Parameters:

string   $field   field of info struct to add a counter for

[ Top ]

method setWritePartition [line 206]

void setWritePartition( $i)

Advances the index of the write partition by one and creates the corresponding web archive.



Parameters:

   $i  

[ Top ]


Documentation generated by phpDocumentor 1.4.3