\seekquarry\yioop\library\media_jobsWikiMediaJob

A media job to download

Subclasses should implement methods they use among init(), checkPrerequisites(), nondistributedTasks(), prepareTasks(), finishTasks(), getTasks(), doTasks(), and putTask(). MediaUpdating can be configured to run in either distributed or nameserver only mode. In the former mode, prepareTasks(), finishTasks() run on the name server, getTasks() and putTask() run in the name server's web app, and doTasks() run on any MediaUpdater clients. In the latter mode, only the method nondistributedTasks() is called by the MediaUpdater and by only the updater on the name server.

Summary

Methods
Properties
Constants
__construct()
init()
run()
checkPrerequisites()
nondistributedTasks()
prepareTasks()
finishTasks()
doTasks()
getTasks()
putTasks()
execNameServer()
getJobName()
getCurrentMachine()
parsePodcastAuxInfo()
updatePodcastsOneGo()
processHtmlPodcast()
getLinkFromQueryPage()
processFeedPodcast()
downloadPodcastItemIfNew()
downloadPodcastItem()
$controller
$media_updater
$name_server_does_client_tasks
$name_server_does_client_tasks_only
$tasks
$update_time
$db
ITEM_EXPIRES_TIME
MAX_PODCASTS_ONE_GO
No protected methods found
No protected properties found
N/A
makeFileNamePattern()
makeFolder()
getPage()
No private properties found
N/A

Constants

ITEM_EXPIRES_TIME

ITEM_EXPIRES_TIME

how long in seconds before a feed item expires

MAX_PODCASTS_ONE_GO

MAX_PODCASTS_ONE_GO

Mamimum number of feeds to download in one try

Properties

$controller

$controller : object

If MediaJob was instantiated in the web app, the controller that instatiated it

Type

object

$media_updater

$media_updater : object

If the MediaJob was instantiated in a MediaUpdater, this is a reference to that updater

Type

object

$name_server_does_client_tasks

$name_server_does_client_tasks : boolean

Whether to run the job's client tasks on the name server in addition to prepareTasks and finishTasks

Type

boolean

$name_server_does_client_tasks_only

$name_server_does_client_tasks_only : boolean

Whether this MediaJob performs name server only tasks

Type

boolean

$tasks

$tasks : array

The most recently received from the name server tasks for this MediaJob

Type

array

$update_time

$update_time : integer

Time in current epoch when feeds last updated

Type

integer

$db

$db : object

Datasource object used to run db queries related to fes items (for storing and updating them)

Type

object

Methods

__construct()

__construct(object  $media_updater = null, object  $controller = null) 

Instiates the MediaJob with a reference to the object that instatiated it

Parameters

object $media_updater

a reference to the media updater that instatiated this object (if being run in MediaUpdater)

object $controller

a reference to the controller that instantiated this object (if being run in the web app)

init()

init() 

Initializes the last update time to far in the past so, feeds will get immediately updated. Sets up connect to DB to store feeds items, and makes it so the same media job runs both on name server and client Media Updaters

run()

run() 

Method executed by MediaUpdater to perform the MediaJob. This method shouldn't need to be overriden. Instead, the various callbacks it calls (listed in the class description) wshould be overriden.

checkPrerequisites()

checkPrerequisites() : boolean

Only update if its been more than an hour since the last update

Returns

boolean —

whether its been an hour since the last update

nondistributedTasks()

nondistributedTasks() 

Get the media sources from the local database and use those to run the the same task as in the distributed setting

prepareTasks()

prepareTasks() 

This method is called on the name server to prepare data for any MediaUpdater clients.

finishTasks()

finishTasks() 

This method is called on the name server to finish processing any data returned by MediaUpdater clients.

doTasks()

doTasks(array  $tasks) : mixed

For each feed source downloads the feeds, checks which items are not in the database, adds them. Then calls the method to rebuild the inverted index shard for feeds

Parameters

array $tasks

array of feed info (url to download, paths to extract etc)

Returns

mixed —

the result of carrying out that processing

getTasks()

getTasks(integer  $machine_id, array  $data = null) : array

Method called from JobController when a MediaUpdater client contacts the name server's web app. This method is supposed to marshal any data on the name server that the requesting client should process.

Parameters

integer $machine_id

id of client requesting data

array $data

any additional info about data being requested

Returns

array —

work for the client to process

putTasks()

putTasks(integer  $machine_id, mixed  $data) : array

After a MediaUpdater client is done with the task given to it by the name server's media updater, the client contact the name server's web app. The name servers web app's JobController then calls this method to receive the data on the name server

Parameters

integer $machine_id

id of client that is sending data to name server

mixed $data

results of computation done by client

Returns

array —

any response information to send back to the client

execNameServer()

execNameServer(string  $command, string  $args = null) : array

Executes a method on the name server's JobController.

It will typically execute either getTask or putTask for a specific Mediajob or getUpdateProperties to find out the current MediaUpdater should be configured.

Parameters

string $command

the method to invoke on the name server

string $args

additional arguments to be passed to the name server

Returns

array —

data returned by the name server.

getJobName()

getJobName() : string

Gets the class name (less namespace and the word Job ) of the current MediaJob

Returns

string —

name of the current job

getCurrentMachine()

getCurrentMachine() : string

Returns a hash of the url of the current machine based on the value saved to current_machine_info.txt by a machine statuses request

Returns

string —

hash of current machine url

parsePodcastAuxInfo()

parsePodcastAuxInfo(  $podcast, boolean  $test_mode = false) 

Used to fill in details for an associative arrays containing the details of a Wiki feed and scrape podcast which should be examined to see if new items should be downloaded to wiki pages. As part of processing expired feed items for the given wiki might be deleted.

Parameters

$podcast
boolean $test_mode

if true then does not cull expired feed items from disk, but will return previously downloaded as if it had.

updatePodcastsOneGo()

updatePodcastsOneGo(  $podcasts, integer  $age = \seekquarry\yioop\configs\ONE_WEEK, boolean  $test_mode = false) : mixed

For each of a supplied list of podcast associative arrays, downloads the non-expired media for that podcast to the wiki folder specified.

Parameters

$podcasts
integer $age

oldest age items to consider for download

boolean $test_mode

if true then rather then updating items in wiki, returns as a string summarizing the results of the downloads that would occur as part of updating the podcast

Returns

mixed —

either true, or if $test_mode is true then the results as a string of the operations involved in downloading the podcasts

processHtmlPodcast()

processHtmlPodcast(  $podcast, integer  $age, boolean  $test_mode = false) : array

Used to download the media item associated with an HTML scrape podcast

Parameters

$podcast
integer $age

max age of an the media item to be considered for download

boolean $test_mode

if true then rather then updating items in wiki, returns as a string summarizing the results of the downloads that would occur as part of updating the podcast

Returns

array —

[whether item downloaded, test_mode_info_string if applicable or "" otherwise]

getLinkFromQueryPage()

getLinkFromQueryPage(string  $xpath, string  $page, string  $dom, string  $source_url) : string

Used to extract a URL from a pagee either as a string of in dom form and to canonicalize it based on a starting url.

Parameters

string $xpath

either an xpath to look into a dom object or a regex to search a page as a string

string $page

source page to search in as a string

string $dom

source page as a dom object

string $source_url

url to use to canonicalize an incomplete url if the extraction only produces part of a url

Returns

string —

desired url link

processFeedPodcast()

processFeedPodcast(  $podcast, integer  $age, boolean  $test_mode = false) : mixed

Processes the page contents of one podcast feed. Determines which podcast files on that page are fresh and if a podcast is fresh downloads it and moves it to the appropariate wiki folder.

Parameters

$podcast
integer $age

how many seconds ago is still considered a recent enough podcast to process

boolean $test_mode

if true then rather then updating items in wiki, returns as a string summarizing the results of the downloads that would occur as part of updating the podcast

Returns

mixed —

either true, or if $test_mode is true then the results as a string of the operations involved in downloading the podcasts

downloadPodcastItemIfNew()

downloadPodcastItemIfNew(array  $item,   $podcast, integer  $age) : boolean

Given a podcast item from a podcast feed page determines if it has been downloaded or not and if not whether it is recent enough to download. If it is recent enough, it scrapes the file to download and downloads any other intermediate files need to find the file to download, then finally downloads this podcast item. If the podcast item is built out of multiple videos, it concatenates them and makes a single video. It then moves the podcast item to the appropriate wiki folder.

Parameters

array $item

an associative array about one item on a podcast feed page

$podcast
integer $age

how many seconds ago is still considered a recent enough podcast to process

Returns

boolean —

whether downloaded or not.

downloadPodcastItem()

downloadPodcastItem(string  $url, string  $type = "mp4", array  $audiolist_urls = array()) : string

Helper method to @see downloadPodcastItemIfNew called when it is known that a podcast item should be downloaded. It downloads a podcast item.

If the podcast item is an intermediate file pointing to several items to download such as video. It downloads these and concatenates them to makes a single video.

Parameters

string $url

of podcast item to download

string $type

file type of podcast item

array $audiolist_urls

an array of audio urls to download if this has already been obtained

Returns

string —

with podcast item if successful or false otherwise

makeFileNamePattern()

makeFileNamePattern(string  $file_name, string  $file_pattern, string  $title = "", integer  $pubdate = null) : string

Used to construct a filename for a downloaded podcast item suitable to be used when stored in a wiki page's resource folder

Parameters

string $file_name

name of file

string $file_pattern

string which can contain %F for previous filename, %T for title, and date %date_command, for example, %Y for year, %m for month, %d for day, etc. These will be substituted with their values when wriitng out the wiki name for the downloaded podcast item.

string $title

a title string for wiki item

integer $pubdate

when the wiki item was published as a Unix timestamp. The value of this is used when computing values for the $file_pattern

Returns

string —

output filename for wiki item

makeFolder()

makeFolder(string  $folder) : boolean

Makes a directory in a way compatible with yioop's error handling.

Parameters

string $folder

name of directory/folder to create.

Returns

boolean —

whether directory was created

getPage()

getPage(  $url) : string

Downloads the internet page with the give url.

Parameters

$url

The url want to download

Returns

string —

contents of downloaded page