CRON_INTERVAL
CRON_INTERVAL
Number of seconds that must elapse after last call before doing cron activities (mainly check liveness of fetchers which should be alive)
This class handles data coming to a queue_server from a fetcher Basically, it receives the data from the fetcher and saves it into various files for later processing by the queue server.
This class can also be used by a fetcher to get status information.
$web_site : \seekquarry\yioop\controllers\seekquarry\yioop\library\WebSite
Stores a reference to the web server when Yioop runs in CLI mode, it acts as request router in non-CLI mode.
In CLI, mode it is useful for caching files in RAM as they are read
__construct(\seekquarry\yioop\controllers\seekquarry\yioop\library\WebSite $web_site = null)
Sets up component activities, instance array, and plugins.
\seekquarry\yioop\controllers\seekquarry\yioop\library\WebSite | $web_site | is the web server when Yioop runs in CLI mode, it acts as request router in non-CLI mode. In CLI, mode it is useful for caching files in RAM as they are read |
redirectWithMessage(string $message, string $copy_fields = false, boolean $restart = false)
Does a 301 redirect to the given location, sets a session variable to display a message when get there.
string | $message | message to write |
string | $copy_fields | $_REQUEST fields to copy for redirect |
boolean | $restart | if yioop is being run as its own server rather than under apache whether to restart this server. |
pagingLogic(\seekquarry\yioop\controllers\array& $data, mixed $field_or_model, string $output_field, integer $default_show, array $search_array = array(), string $var_prefix = "", array $args = null)
When an activity involves displaying tabular data (such as rows of users, groups, etc), this method might be called to set up $data fields for next, prev, and page links, it also makes the call to the model to get the row data sorted and restricted as desired. For some data sources, rather than directly make a call to the model to get the data it might be passed directly to this method.
\seekquarry\yioop\controllers\array& | $data | used to send data to the view will be updated by this method with row and paging data |
mixed | $field_or_model | if an object, this is assumed to be a model and so the getRows method of this model is called to get row data, sorted and restricted according to $search_array; if a string then the row data is assumed to be in $data[$field_or_model] and pagingLogic itself does the sorting and restricting. |
string | $output_field | output rows for the view will be stored in $data[$output_field] |
integer | $default_show | if not specified by $_REQUEST, then this will be used to determine the maximum number of rows that will be written to $data[$output_field] |
array | $search_array | used to sort and restrict in the getRows call or the data from $data[$field_or_model]. Each element of this is a quadruple name of a field, what comparison to perform, a value to check, and an order (ascending/descending) to sort by |
string | $var_prefix | if there are multiple uses of pagingLogic presented on the same view then $var_prefix can be prepended to to the $data field variables like num_show, start_row, end_row to distinguish between them |
array | $args | additional arguments that are passed to getRows and in turn to selectCallback, fromCallback, and whereCallback that might provide user_id, etc to further control which rows are returned |
checkCSRFToken(string $token_name, string $user) : boolean
Checks if the form CSRF (cross-site request forgery preventing) token matches the given user and has not expired (1 hour till expires)
string | $token_name | attribute of $_REQUEST containing CSRFToken |
string | $user | user id |
whether the CSRF token was valid
checkCSRFTime(string $token_name, string $action = "") : boolean
Checks if the timestamp in $_REQUEST[$token_name] matches the timestamp of the last CSRF token accessed by this user for the kind of activity for which there might be a conflict.
This is to avoid accidental replays of postings etc if the back button used.
string | $token_name | name of a $_REQUEST field used to hold a CSRF_TOKEN |
string | $action | name of current action to check for conflicts |
whether a conflicting action has occurred.
clean(mixed $value, mixed $type, mixed $default = null) : string
Used to clean strings that might be tainted as originate from the user
mixed | $value | tainted data |
mixed | $type | type of data in value can be one of the following strings: bool, color, double, float, int, hash, or string, web-url; or it can be an array listing allowed values. If the latter, then if the value is not in the array the cleaned value will be first element of the array if $default is null |
mixed | $default | if $value is not set default value is returned, this isn't used much since if the error_reporting is E_ALL or -1 you would still get a Notice. |
the clean input matching the type provided
convertArrayLines(array $arr, string $endline_string = "\n", boolean $clean = false) : string
Converts an array of lines of strings into a single string with proper newlines, each line having been trimmed and potentially cleaned
array | $arr | the array of lines to be process |
string | $endline_string | what string should be used to indicate the end of a line |
boolean | $clean | whether to clean each line |
a concatenated string of cleaned lines
convertStringCleanArray(string $str, string $line_type = "url") : \seekquarry\yioop\controllers\$lines
Cleans a string consisting of lines, typically of urls into an array of clean lines. This is used in handling data from the crawl options text areas. # is treated as a comment
string | $str | contains the url data |
string | $line_type | does additional cleaning depending on the type of the lines. For instance, if is "url" then a line not beginning with a url scheme will have http:// prepended. |
an array of clean lines
parsePageHeadVarsView(object $view, string $page_name, string $page_data)
Used to set up the head variables for and page_data of a wiki or static page associated with a view.
object | $view | View on which page data will be rendered |
string | $page_name | a string name/id to associate with page. For example, might have 404 for a page about 404 errors |
string | $page_data | this is the actual content of a wiki or static page |
parsePageHeadVars(string $page_data, $with_body = false) : array
Used to parse head meta variables out of a data string provided either from a wiki page or a static page. Meta data is stored in lines before the first occurrence of END_HEAD_VARS. Head variables are name=value pairs. An example of head variable might be: title = This web page's title Anything after a semi-colon on a line in the head section is treated as a comment
string | $page_data | this is the actual content of a wiki or static page |
$with_body |
the associative array of head variables or pair [head vars, page body]
initializeAdFields(\seekquarry\yioop\controllers\array& $data, boolean $ads_off = false)
If external source advertisements are present in the output of this controller this function can be used to initialize the field variables used to write the appropriate Javascripts
\seekquarry\yioop\controllers\array& | $data | data to be used in drawing the view |
boolean | $ads_off | whether or not ads are turned off so that this method should do nothing |
addDifferentialPrivacy(integer $actual_value) : integer
Adds to an integer, $actual_value, epsilon-noise taken from an L_1 gaussian source to centered at $actual_value to get a epsilon private, integer value.
integer | $actual_value | number want to make private |
$fuzzy_value number after noise added
recordViewSession(integer $page_id, string $sub_path, string $media_name)
Used to store in a session which media list items have been viewed so we can put an indicator by them when the media list is rendered
integer | $page_id | the id of page with media list |
string | $sub_path | the resource folder on that page |
string | $media_name | item to store indiicator into session for |
archiveSchedule()
Checks to see whether there are more pages to extract from the current archive, and if so returns the next batch to the requesting fetcher. The iteration progress is automatically saved on each call to nextPages, so that the next fetcher will get the next batch of pages. If there is no current archive to iterate over, or the iterator has reached the end of the archive then indicate that there is no more data by setting the status to NO_DATA_STATE.
checkRestart(string $crawl_type)
Checks if the queue server crawl needs to be restarted Called when a fetcher sends info that invokes the FetchController's update method (on sending schedule, index, robot, etag, etc data).
If the expected to be running crawl is closed on this queue server, and the check_crawl_time (last time fetcher checked name server to see what the active crawl was) is more recent than the time at which it was closed, restart the crawl on the current queue server.
string | $crawl_type | if it does use restart the crawl as a crawl of this type. For example, self::WEB_CRAWL or self::ARCHIVE_CRAWL |
handleUploadedData(string $filename = "") : string
After robot, schedule, and index data have been uploaded and reassembled as one big data file/string, this function splits that string into each of these data types and then save the result into the appropriate schedule sub-folder. Any temporary files used during uploading are then deleted.
string | $filename | name of temp file used to upload big string. If uploaded data was small enough to be uploaded in one go, then this should be "" -- the variable $_REQUEST["part"] will be used instead |
$logging diagnostic info to be sent to fetcher about what was done
addScheduleToScheduleDirectory(string $schedule_name, \seekquarry\yioop\controllers\string& $data_string)
Adds a file with contents $data and with name containing $address and $time to a subfolder $day of a folder $dir
string | $schedule_name | the name of the kind of schedule being saved |
\seekquarry\yioop\controllers\string& | $data_string | encoded, compressed, serialized data the schedule is to contain |
getCrawlTimes() : array
Gets a list of all the timestamps of previously stored crawls
This could probably be moved to crawl model. It is a little lighter than getCrawlList and should be only used with a name server so leaving it here so it won't be confused.
list of timestamps