seek_quarry
[ class tree: seek_quarry ] [ index: seek_quarry ] [ all elements ]

Class: RtfProcessor

Source Location: /lib/processors/rtf_processor.php

Class Overview

PageProcessor
   |
   --TextProcessor
      |
      --RtfProcessor

Used to create crawl summary information for RTF files


Author(s):

  • Chris Pollett

Methods


Inherited Constants

Inherited Variables

Inherited Methods

Class: TextProcessor

TextProcessor::calculateLang()
Tries to determine the language of the document by looking at the
TextProcessor::closeDanglingTags()
If an end of file is reached before closed tags are seen, this methods closes these tags in the correct order.
TextProcessor::extractHttpHttpsUrls()
Tries to extract http or https links from a string of text.
TextProcessor::getBetweenTags()
Gets the text between two tags in a document starting at the current position.
TextProcessor::process()
Computes a summary based on a text string of a document

Class: PageProcessor

PageProcessor::__construct()
Set-ups the any indexing plugins associated with this page processor
PageProcessor::handle()
Method used to handle processing data for a web page. It makes
PageProcessor::process()
Should be implemented to compute a summary based on a

Class Details

[line 49]
Used to create crawl summary information for RTF files



Tags:

author:  Chris Pollett


[ Top ]


Class Methods


static method extractText [line 88]

static string extractText( string $rtf_string)

Gets plain text out of an rtf string

Plain text is mainly extracted by getText(), this function does some pre and post processing of escape braces and stuff




Tags:

return:  plain texts


Parameters:

string   $rtf_string   what to extract plain text out of

[ Top ]

static method getNextObject [line 149]

static string getNextObject( string $rtf_string, int $cur_pos)

Gets the contents of the rtf group at the current position in the string



Tags:

return:  contents of rtf groups


Parameters:

string   $rtf_string   data to get rtf group from
int   $cur_pos   position in $rtf_string at which to get group

[ Top ]

static method getText [line 109]

static string getText( string $rtf_string)

Gets plain text out of an rtf string



Tags:

return:  plain texts


Parameters:

string   $rtf_string   what to extract plain text out of

[ Top ]

method process [line 62]

array process( string $page, string $url)

Computes a summary based on a rtf string of a document



Tags:

return:  a summary of (title, description,links, and content) of the information in $page


Overrides TextProcessor::process() (Computes a summary based on a text string of a document)

Parameters:

string   $page   rtf string of a document
string   $url   location the document came from, not used by RTFProcessor at this point.

[ Top ]


Documentation generated by phpDocumentor 1.4.3