\seekquarry\yioop\controllers\componentsCrawlComponent

This component is used to provide activities for the admin controller related to configuring and performing a web or archive crawl

Summary

Methods
Properties
Constants
__construct()
initializeWikiEditor()
manageCrawls()
editMix()
mixCrawls()
startCrawl()
getCrawlParametersFromSeedInfo()
editCrawlOption()
crawlStatistics()
manageClassifiers()
editClassifier()
pageOptions()
scrapers()
resultsEditor()
searchSources()
$parent
MAX_MIX_FRAGMENTS
No protected methods found
No protected properties found
N/A
No private methods found
No private properties found
N/A

Constants

MAX_MIX_FRAGMENTS

MAX_MIX_FRAGMENTS

Maximum number of search result fragments in a crawl mix

Properties

$parent

$parent : object

Reference to the controller this component lives on

Type

object

Methods

__construct()

__construct(object  $parent_controller) 

Sets up this component by storing in its parent field a reference to controller this component lives on

Parameters

object $parent_controller

reference to the controller this component lives on

initializeWikiEditor()

initializeWikiEditor(\seekquarry\yioop\controllers\components\array&  $data,   $id = "") 

Called to include the Javascript Wiki Editor (wiki.js) on a page and to send any localizations needed from PHP to Javascript-land It is used by both Crawl and SocialComponent

Parameters

\seekquarry\yioop\controllers\components\array& $data

an asscoiative array of data to be used by the view and layout that the wiki editor will be drawn on This method tacks on to INCLUDE_SCRIPTS to make the layout load wiki.js.

$id

if "" then all textareas on page will get editor buttons, if -1 then sets up translations, but does not add any button, otherwise, add buttons to textarea $id will. (Can call this method multiple times, if want more than one but not all)

manageCrawls()

manageCrawls() : array

Used to handle the manage crawl activity.

This activity allows new crawls to be started, statistics about old crawls to be seen. It allows a user to stop the current crawl or restart an old crawl. It also allows a user to configure the options by which a crawl is conducted

Returns

array —

$data information and statistics about crawls in the system as well as status messages on performing a given sub activity

editMix()

editMix(array  $data) 

Handles admin request related to the editing a crawl mix activity

Parameters

array $data

info about the fragments and their contents for a particular crawl mix (changed by this method)

mixCrawls()

mixCrawls() : array

Handles admin request related to the crawl mix activity

The crawl mix activity allows a user to create/edit crawl mixes: weighted combinations of search indexes

Returns

array —

$data info about available crawl mixes and changes to them as well as any messages about the success or failure of a sub activity.

startCrawl()

startCrawl(\seekquarry\yioop\controllers\components\array&  $data, array  $request_fields) 

Called from @see manageCrawls to start a new crawl on the machines $machine_urls. Updates $data array with crawl start message

Parameters

\seekquarry\yioop\controllers\components\array& $data

an array of info to supply to AdminView

array $request_fields

if start crawl fails this is a list of request fields to preserve in the redirect message

getCrawlParametersFromSeedInfo()

getCrawlParametersFromSeedInfo(\seekquarry\yioop\controllers\components\array&  $crawl_params, array  $seed_info) 

Reads the parameters for a crawl from an array gotten from a crawl.ini file

Parameters

\seekquarry\yioop\controllers\components\array& $crawl_params

parameters to write to queue_server

array $seed_info

data from crawl.ini file

editCrawlOption()

editCrawlOption(\seekquarry\yioop\controllers\components\array&  $data, array  $machine_urls) 

Called from @see manageCrawls to edit the parameters for the next crawl (or current crawl) to be carried out by the machines $machine_urls. Updates $data array to be supplied to AdminView

Parameters

\seekquarry\yioop\controllers\components\array& $data

an array of info to supply to AdminView

array $machine_urls

string urls of machines managed by this Yioop name server on which to perform the crawl

crawlStatistics()

crawlStatistics(\seekquarry\yioop\controllers\components\array&  $data, array  $machine_urls) 

Called from @see manageCrawls to read in the file with statistics information about a crawl. This file is computed by @see AnalyticsJob

Parameters

\seekquarry\yioop\controllers\components\array& $data

an array of info to supply to AdminView

array $machine_urls

machines that are being used in crawl Yioop name server on which to perform the crawl

manageClassifiers()

manageClassifiers() 

Handles admin requests for creating, editing, and deleting classifiers.

This activity implements the logic for the page that lists existing classifiers, including the actions that can be performed on them.

editClassifier()

editClassifier(array  $data, array  $classifiers, array  $machine_urls) 

Handles the particulars of editing a classifier, which includes changing its label and adding training examples.

This activity directly handles changing the class label, but not adding training examples. The latter activity is done interactively without reloading the page via XmlHttpRequests, coordinated by the classifier controller dedicated to that task.

Parameters

array $data

data to be passed on to the view

array $classifiers

map from class labels to their associated classifiers

array $machine_urls

string urls of machines managed by this Yioop name server

pageOptions()

pageOptions() 

Handles admin request related to controlling file options to be used in a crawl

This activity allows a user to specify the page range size to be be used during a crawl as well as which file types can be downloaded

scrapers()

scrapers() : array

Handles admin request related to the Scrapers activity

This activity allows a user to specify the configuration for the ways we detect Scrapers

Returns

array —

$data info about the Scraper settings

resultsEditor()

resultsEditor() : array

Handles admin request related to the search filter activity

This activity allows a user to specify hosts whose web pages are to be filtered out the search results

Returns

array —

$data info about the groups and their contents for a particular crawl mix

searchSources()

searchSources() : array

Handles admin request related to the search sources activity

The search sources activity allows a user to add/delete search sources for news and podcasts, it also allows a user to control which subsearches appear on the SearchView page

Returns

array —

$data info about current search sources, and current sub-searches