seek_quarry
[ class tree: seek_quarry ] [ index: seek_quarry ] [ all elements ]

Class: FetchController

Source Location: /controllers/fetch_controller.php

Class Overview

Controller
   |
   --FetchController

This class handles data coming to a queue_server from a fetcher Basically, it receives the data from the fetcher and saves it into various files for later processing by the queue server.


Author(s):

  • Chris Pollett

Implements interfaces:

Variables

Constants

Methods


Inherited Variables

Inherited Methods

Class: Controller

Controller::__construct()
Controller::checkCSRFToken()
Checks if the form CSRF (cross-site request forgery preventing) token matches the given user and has not expired (1 hour till expires)
Controller::checkRequest()
Checks the request if a request is for a valid activity and if it uses the correct authorization key
Controller::clean()
Used to clean strings that might be tainted as originate from the user
Controller::displayView()
Send the provided view to output, drawing it with the given data variable, using the current locale for translation, and writing mode
Controller::generateCSRFToken()
Generates a cross site request forgery preventing token based on the provided user name, the current time and the hidden AUTH_KEY
Controller::processRequest()
This function should be overriden to web handle requests

Class Details

[line 57]
This class handles data coming to a queue_server from a fetcher Basically, it receives the data from the fetcher and saves it into various files for later processing by the queue server.

This class can also be used by a fetcher to get status information.




Tags:

author:  Chris Pollett


[ Top ]


Class Variables

$activities = array("schedule", "archiveSchedule", "update",
        "crawlTime")

[line 73]

These are the activities supported by this controller


Type:   array


[ Top ]

$models = array("machine", "crawl", "cron")

[line 63]

No models used by this controller


Type:   array
Overrides:   Array


[ Top ]

$views = array("fetch")

[line 68]

Load FetchView to return results to fetcher


Type:   array
Overrides:   Array


[ Top ]



Class Methods


method addScheduleToScheduleDirectory [line 443]

void addScheduleToScheduleDirectory( string $schedule_name, string &$data_string)

Adds a file with contents $data and with name containing $address and $time to a subfolder $day of a folder $dir



Parameters:

string   $schedule_name   the name of the kind of schedule being saved
string   &$data_string   encoded, compressed, serialized data the schedule is to contain

[ Top ]

method archiveSchedule [line 180]

void archiveSchedule( )

Checks to see whether there are more pages to extract from the current

archive, and if so returns the next batch to the requesting fetcher. The iteration progress is automatically saved on each call to nextPages, so that the next fetcher will get the next batch of pages. If there is no current archive to iterate over, or the iterator has reached the end of the archive then indicate that there is no more data by setting the status to NO_DATA_STATE.




[ Top ]

method crawlTime [line 473]

void crawlTime( )

Checks for the crawl time according either to crawl_status.txt or to network_status.txt, and presents it to the requesting fetcher, along with a list of available queue servers.



[ Top ]

method doCronTasks [line 541]

void doCronTasks( )

Used to do periodic maintenance tasks for the Name Server.

For now, just checks if any fetchers which the user turned on have crashed and if so restarts them




[ Top ]

method getCrawlTimes [line 555]

array getCrawlTimes( )

Gets a list of all the timestamps of previously stored crawls

This could probably be moved to crawl model. It is a little lighter than getCrawlList and should be only used with a name server so leaving it here so it won't be confused.




Tags:

return:  list of timestamps


[ Top ]

method handleUploadedData [line 388]

string handleUploadedData( [string $filename = ""])

After robot, schedule, and index data have been uploaded and reassembled

as one big data file/string, this function splits that string into each of these data types and then save the result into the appropriate schedule sub-folder. Any temporary files used during uploading are then deleted.




Tags:

return:  diagnostic info to be sent to fetcher about what was done


Parameters:

string   $filename   name of temp file used to upload big string. If uploaded data was small enough to be uploaded in one go, then this should be "" -- the variable $_REQUEST["part"] will be used instead

[ Top ]

method processRequest [line 87]

void processRequest( )

Checks that the request seems to be coming from a legitimate fetcher then determines which activity the fetcher is requesting and calls that activity for processing.



Overrides Controller::processRequest() (This function should be overriden to web handle requests)

[ Top ]

method schedule [line 115]

void schedule( )

Checks if there is a schedule of sites to crawl available and if so present it to the requesting fetcher, and then delete it.



[ Top ]

method update [line 274]

void update( )

Processes Robot, To Crawl, and Index data sent from a fetcher Acknowledge to the fetcher if this data was received okay.



[ Top ]


Class Constants

CRON_INTERVAL =  300

[line 81]

Number of seconds that must elapse after last call before doing

cron activities (mainly check liveness of fetchers which should be alive)



[ Top ]



Documentation generated by phpDocumentor 1.4.3