seek_quarry
[ class tree: seek_quarry ] [ index: seek_quarry ] [ all elements ]

Class: RobotProcessor

Source Location: /lib/processors/robot_processor.php

Class Overview

PageProcessor
   |
   --RobotProcessor

Processor class used to extract information from robots.txt files


Author(s):

  • Chris Pollett

Methods


Inherited Variables

Inherited Methods

Class: PageProcessor

PageProcessor::__construct()
Set-ups the any indexing plugins associated with this page processor
PageProcessor::handle()
Method used to handle processing data for a web page. It makes
PageProcessor::process()
Should be implemented to compute a summary based on a

Class Details

[line 54]
Processor class used to extract information from robots.txt files



Tags:

author:  Chris Pollett


[ Top ]


Class Methods


method makeCanonicalRobotPath [line 187]

void makeCanonicalRobotPath( $path)

For robot paths

foo is treated the same as /foo Path might contain urlencoded characters. These are all decoded except for %2F which corresponds to a / (this is as per http://www.robotstxt.org/norobots-rfc.txt)




Parameters:

   $path  

[ Top ]

method process [line 71]

array process( string $page, string $url)

Parses the contents of a robots.txt page extracting allowed, disallowed paths, crawl-delay, and sitemaps. We also extract a list of all user agent strings seen.



Tags:

return:  a summary of (title, description, links, and content) of the information in $page


Overrides PageProcessor::process() (Should be implemented to compute a summary based on a)

Parameters:

string   $page   text string of a document
string   $url   location the document came from, not used by TextProcessor at this point. Some of its subclasses override this method and use url to produce complete links for relative links within a document

[ Top ]


Documentation generated by phpDocumentor 1.4.3