\seekquarry\yioop\controllersClassifierController

This class handles XmlHttpRequests to label documents during classifier construction.

Searching for new documents to label and add to the training set is a heavily-interactive operation, so it is implemented using asynchronous requests to this controller in order to fetch candidates for labeling and add labels without reloading the classifier edit page. The admin controller takes care of first displaying the "edit classifier" page, and handles requests to change a classifier's class label, but this controller handles the other asynchronous requests issued by the JavaScript on the page.

Summary

Methods
Properties
Constants
__construct()
processRequest()
component()
model()
plugin()
getIndexingPluginList()
view()
displayView()
redirectWithMessage()
redirectLocation()
pagingLogic()
call()
generateCSRFToken()
checkCSRFToken()
checkCSRFTime()
getCSRFTime()
clean()
convertArrayLines()
convertStringCleanArray()
checkRequest()
parsePageHeadVarsView()
parsePageHeadVars()
initializeAdFields()
addDifferentialPrivacy()
recordViewSession()
classify()
buildClassifierCrawlMix()
retrieveClassifierCrawlMix()
prepareUnlabelledDocument()
$web_site
$component_instances
$view_instances
$model_instances
$plugin_instances
$activities
$activity_component
$component_activities
No constants found
No protected methods found
No protected properties found
N/A
No private methods found
No private properties found
N/A

Properties

$web_site

$web_site : \seekquarry\yioop\controllers\seekquarry\yioop\library\WebSite

Stores a reference to the web server when Yioop runs in CLI mode, it acts as request router in non-CLI mode.

In CLI, mode it is useful for caching files in RAM as they are read

Type

\seekquarry\yioop\controllers\seekquarry\yioop\library\WebSite

$component_instances

$component_instances : array

Array of instances of components used by this controller

Type

array

$view_instances

$view_instances : array

Array of instances of views used by this controller

Type

array

$model_instances

$model_instances : array

Array of instances of models used by this controller

Type

array

$plugin_instances

$plugin_instances : array

Array of instances of indexing_plugins used by this controller

Type

array

$activities

$activities : array

These are the activities supported by this controller

Type

array

$activity_component

$activity_component : array

Associative array of activity => component activity is on, used by @see Controller::call method to actually invoke a given activity on a given component

Type

array

$component_activities

$component_activities : array

Associative array of $components activities for this controller Components are collections of activities (a little like traits) which can be reused.

Type

array

Methods

__construct()

__construct(\seekquarry\yioop\controllers\seekquarry\yioop\library\WebSite  $web_site = null) 

Sets up component activities, instance array, and plugins.

Parameters

\seekquarry\yioop\controllers\seekquarry\yioop\library\WebSite $web_site

is the web server when Yioop runs in CLI mode, it acts as request router in non-CLI mode. In CLI, mode it is useful for caching files in RAM as they are read

processRequest()

processRequest() 

Checks that the request seems to be coming from a legitimate, logged-in user, then dispatches to the appropriate activity.

component()

component(string  $component) 

Dynamic loader for Component objects which might live on the current Component

Parameters

string $component

name of model to return

model()

model(string  $model) 

Dynamic loader for Model objects which might live on the current Controller

Parameters

string $model

name of model to return

plugin()

plugin(string  $plugin) 

Dynamic loader for Plugin objects which might live on the current Controller

Parameters

string $plugin

name of Plugin to return

getIndexingPluginList()

getIndexingPluginList() 

Used to get a list of all available indexing plugins for this Yioop instance.

view()

view(string  $view) 

Dynamic loader for View objects which might live on the current Controller

Parameters

string $view

name of view to return

displayView()

displayView(string  $view, array  $data) 

Send the provided view to output, drawing it with the given data variable, using the current locale for translation, and writing mode

Parameters

string $view

the name of the view to draw

array $data

an array of values to use in drawing the view

redirectWithMessage()

redirectWithMessage(string  $message, string  $copy_fields = false, boolean  $restart = false) 

Does a 301 redirect to the given location, sets a session variable to display a message when get there.

Parameters

string $message

message to write

string $copy_fields

$_REQUEST fields to copy for redirect

boolean $restart

if yioop is being run as its own server rather than under apache whether to restart this server.

redirectLocation()

redirectLocation(string  $location) 

Method to perform a 301 redirect to $location in both under web server and CLI setting

Parameters

string $location

url to redirect to

pagingLogic()

pagingLogic(\seekquarry\yioop\controllers\array&  $data, mixed  $field_or_model, string  $output_field, integer  $default_show, array  $search_array = array(), string  $var_prefix = "", array  $args = null) 

When an activity involves displaying tabular data (such as rows of users, groups, etc), this method might be called to set up $data fields for next, prev, and page links, it also makes the call to the model to get the row data sorted and restricted as desired. For some data sources, rather than directly make a call to the model to get the data it might be passed directly to this method.

Parameters

\seekquarry\yioop\controllers\array& $data

used to send data to the view will be updated by this method with row and paging data

mixed $field_or_model

if an object, this is assumed to be a model and so the getRows method of this model is called to get row data, sorted and restricted according to $search_array; if a string then the row data is assumed to be in $data[$field_or_model] and pagingLogic itself does the sorting and restricting.

string $output_field

output rows for the view will be stored in $data[$output_field]

integer $default_show

if not specified by $_REQUEST, then this will be used to determine the maximum number of rows that will be written to $data[$output_field]

array $search_array

used to sort and restrict in the getRows call or the data from $data[$field_or_model]. Each element of this is a quadruple name of a field, what comparison to perform, a value to check, and an order (ascending/descending) to sort by

string $var_prefix

if there are multiple uses of pagingLogic presented on the same view then $var_prefix can be prepended to to the $data field variables like num_show, start_row, end_row to distinguish between them

array $args

additional arguments that are passed to getRows and in turn to selectCallback, fromCallback, and whereCallback that might provide user_id, etc to further control which rows are returned

call()

call(  $activity) 

Used to invoke an activity method of the current controller or one its components

Parameters

$activity

method to invoke

generateCSRFToken()

generateCSRFToken(string  $user) : string

Generates a cross site request forgery preventing token based on the provided user name, the current time and the hidden AUTH_KEY

Parameters

string $user

username to use to generate token

Returns

string —

a csrf token

checkCSRFToken()

checkCSRFToken(string  $token_name, string  $user) : boolean

Checks if the form CSRF (cross-site request forgery preventing) token matches the given user and has not expired (1 hour till expires)

Parameters

string $token_name

attribute of $_REQUEST containing CSRFToken

string $user

user id

Returns

boolean —

whether the CSRF token was valid

checkCSRFTime()

checkCSRFTime(string  $token_name, string  $action = "") : boolean

Checks if the timestamp in $_REQUEST[$token_name] matches the timestamp of the last CSRF token accessed by this user for the kind of activity for which there might be a conflict.

This is to avoid accidental replays of postings etc if the back button used.

Parameters

string $token_name

name of a $_REQUEST field used to hold a CSRF_TOKEN

string $action

name of current action to check for conflicts

Returns

boolean —

whether a conflicting action has occurred.

getCSRFTime()

getCSRFTime(string  $token_name) : integer

Used to return just the timestamp portion of the CSRF token

Parameters

string $token_name

name of a $_REQUEST field used to hold a CSRF_TOKEN

Returns

integer —

the timestamp portion of the CSRF_TOKEN

clean()

clean(mixed  $value, mixed  $type, mixed  $default = null) : string

Used to clean strings that might be tainted as originate from the user

Parameters

mixed $value

tainted data

mixed $type

type of data in value can be one of the following strings: bool, color, double, float, int, hash, or string, web-url; or it can be an array listing allowed values. If the latter, then if the value is not in the array the cleaned value will be first element of the array if $default is null

mixed $default

if $value is not set default value is returned, this isn't used much since if the error_reporting is E_ALL or -1 you would still get a Notice.

Returns

string —

the clean input matching the type provided

convertArrayLines()

convertArrayLines(array  $arr, string  $endline_string = "\n", boolean  $clean = false) : string

Converts an array of lines of strings into a single string with proper newlines, each line having been trimmed and potentially cleaned

Parameters

array $arr

the array of lines to be process

string $endline_string

what string should be used to indicate the end of a line

boolean $clean

whether to clean each line

Returns

string —

a concatenated string of cleaned lines

convertStringCleanArray()

convertStringCleanArray(string  $str, string  $line_type = "url") : \seekquarry\yioop\controllers\$lines

Cleans a string consisting of lines, typically of urls into an array of clean lines. This is used in handling data from the crawl options text areas. # is treated as a comment

Parameters

string $str

contains the url data

string $line_type

does additional cleaning depending on the type of the lines. For instance, if is "url" then a line not beginning with a url scheme will have http:// prepended.

Returns

\seekquarry\yioop\controllers\$lines —

an array of clean lines

checkRequest()

checkRequest() : boolean

Checks the request if a request is for a valid activity and if it uses the correct authorization key

Returns

boolean —

whether the request was valid or not

parsePageHeadVarsView()

parsePageHeadVarsView(object  $view, string  $page_name, string  $page_data) 

Used to set up the head variables for and page_data of a wiki or static page associated with a view.

Parameters

object $view

View on which page data will be rendered

string $page_name

a string name/id to associate with page. For example, might have 404 for a page about 404 errors

string $page_data

this is the actual content of a wiki or static page

parsePageHeadVars()

parsePageHeadVars(string  $page_data,   $with_body = false) : array

Used to parse head meta variables out of a data string provided either from a wiki page or a static page. Meta data is stored in lines before the first occurrence of END_HEAD_VARS. Head variables are name=value pairs. An example of head variable might be: title = This web page's title Anything after a semi-colon on a line in the head section is treated as a comment

Parameters

string $page_data

this is the actual content of a wiki or static page

$with_body

Returns

array —

the associative array of head variables or pair [head vars, page body]

initializeAdFields()

initializeAdFields(\seekquarry\yioop\controllers\array&  $data, boolean  $ads_off = false) 

If external source advertisements are present in the output of this controller this function can be used to initialize the field variables used to write the appropriate Javascripts

Parameters

\seekquarry\yioop\controllers\array& $data

data to be used in drawing the view

boolean $ads_off

whether or not ads are turned off so that this method should do nothing

addDifferentialPrivacy()

addDifferentialPrivacy(integer  $actual_value) : integer

Adds to an integer, $actual_value, epsilon-noise taken from an L_1 gaussian source to centered at $actual_value to get a epsilon private, integer value.

Parameters

integer $actual_value

number want to make private

Returns

integer —

$fuzzy_value number after noise added

recordViewSession()

recordViewSession(integer  $page_id, string  $sub_path, string  $media_name) 

Used to store in a session which media list items have been viewed so we can put an indicator by them when the media list is rendered

Parameters

integer $page_id

the id of page with media list

string $sub_path

the resource folder on that page

string $media_name

item to store indiicator into session for

classify()

classify() 

Finds the next document for which to request a label, sometimes first recording the label that the user selected for the last document. This method should only be called via an XmlHttpRequest initiated by the edit classifier JavaScript, and consequently it always writes out JSON-encoded data, which is easily decoded by the page JavaScript.

buildClassifierCrawlMix()

buildClassifierCrawlMix(string  $label, integer  $crawl_time, string  $keywords) : object

Creates a new crawl mix for an existing index, with an optional query, and returns an iterator for the mix. The crawl mix name is derived from the class label, so that it can be easily retrieved and deleted later on.

Parameters

string $label

class label of the classifier the new crawl mix will be associated with

integer $crawl_time

timestamp of the index to be iterated over

string $keywords

an optional query used to restrict the pages retrieved by the crawl mix

Returns

object —

A MixArchiveBundleIterator instance that will iterate over the pages of the requested index

retrieveClassifierCrawlMix()

retrieveClassifierCrawlMix(string  $label) : object

Retrieves an iterator for an existing crawl mix. The crawl mix remembers its previous offset, so that the new iterator picks up where the previous one left off.

Parameters

string $label

class label of the classifier this crawl mix is associated with

Returns

object —

new MixArchiveBundleIterator instance that picks up where the previous one left off

prepareUnlabelledDocument()

prepareUnlabelledDocument(array  $page, float  $score, float  $disagreement, integer  $crawl_time, string  $keywords) : array

Creates a fresh array from an existing page summary array, and augments it with extra data relevant to the labeling interface on the client.

Parameters

array $page

original page summary array

float $score

classification score (estimated by the Naive Bayes text classification algorithm) for $page

float $disagreement

disagreement score computed for $page

integer $crawl_time

index the page came from

string $keywords

query supplied to the crawl mix used to find $page

Returns

array —

reduced page summary structure containing only the information that the client needs to display a summary of the page