Constants

CRON_INTERVAL

CRON_INTERVAL

Number of seconds that must elapse after last call before doing cron activities (mainly check liveness of fetchers which should be alive)

Properties

$web_site

$web_site : \seekquarry\yioop\controllers\seekquarry\yioop\library\WebSite

Stores a reference to the web server when Yioop runs in CLI mode, it acts as request router in non-CLI mode.

In CLI, mode it is useful for caching files in RAM as they are read

Type

\seekquarry\yioop\controllers\seekquarry\yioop\library\WebSite

$component_instances

$component_instances : array

Array of instances of components used by this controller

Type

array

$view_instances

$view_instances : array

Array of instances of views used by this controller

Type

array

$model_instances

$model_instances : array

Array of instances of models used by this controller

Type

array

$plugin_instances

$plugin_instances : array

Array of instances of indexing_plugins used by this controller

Type

array

$activities

$activities : array

These are the activities supported by this controller

Type

array

$activity_component

$activity_component : array

Associative array of activity => component activity is on, used by @see Controller::call method to actually invoke a given activity on a given component

Type

array

$component_activities

$component_activities : array

Associative array of $components activities for this controller Components are collections of activities (a little like traits) which can be reused.

Type

array

$crawl_status_file_name

$crawl_status_file_name : string

File of file used to store info about the status of a queue server's active crawl. Default to channel 0 but might change in

Type

string

Methods

__construct()

__construct(\seekquarry\yioop\controllers\seekquarry\yioop\library\WebSite  $web_site = null) 

Sets up component activities, instance array, and plugins.

Parameters

\seekquarry\yioop\controllers\seekquarry\yioop\library\WebSite $web_site

is the web server when Yioop runs in CLI mode, it acts as request router in non-CLI mode. In CLI, mode it is useful for caching files in RAM as they are read

processRequest()

processRequest() 

Checks that the request seems to be coming from a legitimate fetcher then determines which activity the fetcher is requesting and calls that activity for processing.

component()

component(string  $component) 

Dynamic loader for Component objects which might live on the current Component

Parameters

string $component

name of model to return

model()

model(string  $model) 

Dynamic loader for Model objects which might live on the current Controller

Parameters

string $model

name of model to return

plugin()

plugin(string  $plugin) 

Dynamic loader for Plugin objects which might live on the current Controller

Parameters

string $plugin

name of Plugin to return

getIndexingPluginList()

getIndexingPluginList() 

Used to get a list of all available indexing plugins for this Yioop instance.

view()

view(string  $view) 

Dynamic loader for View objects which might live on the current Controller

Parameters

string $view

name of view to return

displayView()

displayView(string  $view, array  $data) 

Send the provided view to output, drawing it with the given data variable, using the current locale for translation, and writing mode

Parameters

string $view

the name of the view to draw

array $data

an array of values to use in drawing the view

redirectWithMessage()

redirectWithMessage(string  $message, string  $copy_fields = false, boolean  $restart = false) 

Does a 301 redirect to the given location, sets a session variable to display a message when get there.

Parameters

string $message

message to write

string $copy_fields

$_REQUEST fields to copy for redirect

boolean $restart

if yioop is being run as its own server rather than under apache whether to restart this server.

redirectLocation()

redirectLocation(string  $location) 

Method to perform a 301 redirect to $location in both under web server and CLI setting

Parameters

string $location

url to redirect to

pagingLogic()

pagingLogic(\seekquarry\yioop\controllers\array&  $data, mixed  $field_or_model, string  $output_field, integer  $default_show, array  $search_array = array(), string  $var_prefix = "", array  $args = null) 

When an activity involves displaying tabular data (such as rows of users, groups, etc), this method might be called to set up $data fields for next, prev, and page links, it also makes the call to the model to get the row data sorted and restricted as desired. For some data sources, rather than directly make a call to the model to get the data it might be passed directly to this method.

Parameters

\seekquarry\yioop\controllers\array& $data

used to send data to the view will be updated by this method with row and paging data

mixed $field_or_model

if an object, this is assumed to be a model and so the getRows method of this model is called to get row data, sorted and restricted according to $search_array; if a string then the row data is assumed to be in $data[$field_or_model] and pagingLogic itself does the sorting and restricting.

string $output_field

output rows for the view will be stored in $data[$output_field]

integer $default_show

if not specified by $_REQUEST, then this will be used to determine the maximum number of rows that will be written to $data[$output_field]

array $search_array

used to sort and restrict in the getRows call or the data from $data[$field_or_model]. Each element of this is a quadruple name of a field, what comparison to perform, a value to check, and an order (ascending/descending) to sort by

string $var_prefix

if there are multiple uses of pagingLogic presented on the same view then $var_prefix can be prepended to to the $data field variables like num_show, start_row, end_row to distinguish between them

array $args

additional arguments that are passed to getRows and in turn to selectCallback, fromCallback, and whereCallback that might provide user_id, etc to further control which rows are returned

call()

call(  $activity) 

Used to invoke an activity method of the current controller or one its components

Parameters

$activity

method to invoke

generateCSRFToken()

generateCSRFToken(string  $user) : string

Generates a cross site request forgery preventing token based on the provided user name, the current time and the hidden AUTH_KEY

Parameters

string $user

username to use to generate token

Returns

string —

a csrf token

checkCSRFToken()

checkCSRFToken(string  $token_name, string  $user) : boolean

Checks if the form CSRF (cross-site request forgery preventing) token matches the given user and has not expired (1 hour till expires)

Parameters

string $token_name

attribute of $_REQUEST containing CSRFToken

string $user

user id

Returns

boolean —

whether the CSRF token was valid

checkCSRFTime()

checkCSRFTime(string  $token_name, string  $action = "") : boolean

Checks if the timestamp in $_REQUEST[$token_name] matches the timestamp of the last CSRF token accessed by this user for the kind of activity for which there might be a conflict.

This is to avoid accidental replays of postings etc if the back button used.

Parameters

string $token_name

name of a $_REQUEST field used to hold a CSRF_TOKEN

string $action

name of current action to check for conflicts

Returns

boolean —

whether a conflicting action has occurred.

getCSRFTime()

getCSRFTime(string  $token_name) : integer

Used to return just the timestamp portion of the CSRF token

Parameters

string $token_name

name of a $_REQUEST field used to hold a CSRF_TOKEN

Returns

integer —

the timestamp portion of the CSRF_TOKEN

clean()

clean(mixed  $value, mixed  $type, mixed  $default = null) : string

Used to clean strings that might be tainted as originate from the user

Parameters

mixed $value

tainted data

mixed $type

type of data in value can be one of the following strings: bool, color, double, float, int, hash, or string, web-url; or it can be an array listing allowed values. If the latter, then if the value is not in the array the cleaned value will be first element of the array if $default is null

mixed $default

if $value is not set default value is returned, this isn't used much since if the error_reporting is E_ALL or -1 you would still get a Notice.

Returns

string —

the clean input matching the type provided

convertArrayLines()

convertArrayLines(array  $arr, string  $endline_string = "\n", boolean  $clean = false) : string

Converts an array of lines of strings into a single string with proper newlines, each line having been trimmed and potentially cleaned

Parameters

array $arr

the array of lines to be process

string $endline_string

what string should be used to indicate the end of a line

boolean $clean

whether to clean each line

Returns

string —

a concatenated string of cleaned lines

convertStringCleanArray()

convertStringCleanArray(string  $str, string  $line_type = "url") : \seekquarry\yioop\controllers\$lines

Cleans a string consisting of lines, typically of urls into an array of clean lines. This is used in handling data from the crawl options text areas. # is treated as a comment

Parameters

string $str

contains the url data

string $line_type

does additional cleaning depending on the type of the lines. For instance, if is "url" then a line not beginning with a url scheme will have http:// prepended.

Returns

\seekquarry\yioop\controllers\$lines —

an array of clean lines

checkRequest()

checkRequest() : boolean

Checks the request if a request is for a valid activity and if it uses the correct authorization key

Returns

boolean —

whether the request was valid or not

parsePageHeadVarsView()

parsePageHeadVarsView(object  $view, string  $page_name, string  $page_data) 

Used to set up the head variables for and page_data of a wiki or static page associated with a view.

Parameters

object $view

View on which page data will be rendered

string $page_name

a string name/id to associate with page. For example, might have 404 for a page about 404 errors

string $page_data

this is the actual content of a wiki or static page

parsePageHeadVars()

parsePageHeadVars(string  $page_data,   $with_body = false) : array

Used to parse head meta variables out of a data string provided either from a wiki page or a static page. Meta data is stored in lines before the first occurrence of END_HEAD_VARS. Head variables are name=value pairs. An example of head variable might be: title = This web page's title Anything after a semi-colon on a line in the head section is treated as a comment

Parameters

string $page_data

this is the actual content of a wiki or static page

$with_body

Returns

array —

the associative array of head variables or pair [head vars, page body]

initializeAdFields()

initializeAdFields(\seekquarry\yioop\controllers\array&  $data, boolean  $ads_off = false) 

If external source advertisements are present in the output of this controller this function can be used to initialize the field variables used to write the appropriate Javascripts

Parameters

\seekquarry\yioop\controllers\array& $data

data to be used in drawing the view

boolean $ads_off

whether or not ads are turned off so that this method should do nothing

addDifferentialPrivacy()

addDifferentialPrivacy(integer  $actual_value) : integer

Adds to an integer, $actual_value, epsilon-noise taken from an L_1 gaussian source to centered at $actual_value to get a epsilon private, integer value.

Parameters

integer $actual_value

number want to make private

Returns

integer —

$fuzzy_value number after noise added

recordViewSession()

recordViewSession(integer  $page_id, string  $sub_path, string  $media_name) 

Used to store in a session which media list items have been viewed so we can put an indicator by them when the media list is rendered

Parameters

integer $page_id

the id of page with media list

string $sub_path

the resource folder on that page

string $media_name

item to store indiicator into session for

getChannel()

getChannel() : integer

Returns the channel used by the given uploaded data

Returns

integer —

channel used

schedule()

schedule() 

Checks if there is a schedule of sites to crawl available and if so present it to the requesting fetcher, and then delete it.

archiveSchedule()

archiveSchedule() 

Checks to see whether there are more pages to extract from the current archive, and if so returns the next batch to the requesting fetcher. The iteration progress is automatically saved on each call to nextPages, so that the next fetcher will get the next batch of pages. If there is no current archive to iterate over, or the iterator has reached the end of the archive then indicate that there is no more data by setting the status to NO_DATA_STATE.

checkRestart()

checkRestart(string  $crawl_type) 

Checks if the queue server crawl needs to be restarted Called when a fetcher sends info that invokes the FetchController's update method (on sending schedule, index, robot, etag, etc data).

If the expected to be running crawl is closed on this queue server, and the check_crawl_time (last time fetcher checked name server to see what the active crawl was) is more recent than the time at which it was closed, restart the crawl on the current queue server.

Parameters

string $crawl_type

if it does use restart the crawl as a crawl of this type. For example, self::WEB_CRAWL or self::ARCHIVE_CRAWL

update()

update() 

Processes Robot, To Crawl, and Index data sent from a fetcher Acknowledge to the fetcher if this data was received okay.

handleUploadedData()

handleUploadedData(string  $filename = "") : string

After robot, schedule, and index data have been uploaded and reassembled as one big data file/string, this function splits that string into each of these data types and then save the result into the appropriate schedule sub-folder. Any temporary files used during uploading are then deleted.

Parameters

string $filename

name of temp file used to upload big string. If uploaded data was small enough to be uploaded in one go, then this should be "" -- the variable $_REQUEST["part"] will be used instead

Returns

string —

$logging diagnostic info to be sent to fetcher about what was done

addScheduleToScheduleDirectory()

addScheduleToScheduleDirectory(string  $schedule_name, \seekquarry\yioop\controllers\string&  $data_string) 

Adds a file with contents $data and with name containing $address and $time to a subfolder $day of a folder $dir

Parameters

string $schedule_name

the name of the kind of schedule being saved

\seekquarry\yioop\controllers\string& $data_string

encoded, compressed, serialized data the schedule is to contain

crawlTime()

crawlTime() 

Checks for the crawl time according either to crawl_status.txt or to network_status.txt, and presents it to the requesting fetcher, along with a list of available queue servers.

doFetcherCronTasks()

doFetcherCronTasks() 

Used to do periodic maintenance tasks for the Name Server.

For now, just checks if any fetchers which the user turned on have crashed and if so restarts them

getCrawlTimes()

getCrawlTimes() : array

Gets a list of all the timestamps of previously stored crawls

This could probably be moved to crawl model. It is a little lighter than getCrawlList and should be only used with a name server so leaving it here so it won't be confused.

Returns

array —

list of timestamps