ITEM_EXPIRES_TIME
ITEM_EXPIRES_TIME
how long in seconds before a feed item expires
A media job to download
Subclasses should implement methods they use among init(), checkPrerequisites(), nondistributedTasks(), prepareTasks(), finishTasks(), getTasks(), doTasks(), and putTask(). MediaUpdating can be configured to run in either distributed or nameserver only mode. In the former mode, prepareTasks(), finishTasks() run on the name server, getTasks() and putTask() run in the name server's web app, and doTasks() run on any MediaUpdater clients. In the latter mode, only the method nondistributedTasks() is called by the MediaUpdater and by only the updater on the name server.
__construct(object $media_updater = null, object $controller = null)
Instiates the MediaJob with a reference to the object that instatiated it
object | $media_updater | a reference to the media updater that instatiated this object (if being run in MediaUpdater) |
object | $controller | a reference to the controller that instantiated this object (if being run in the web app) |
doTasks(array $tasks) : mixed
For each feed source downloads the feeds, checks which items are not in the database, adds them. Then calls the method to rebuild the inverted index shard for feeds
array | $tasks | array of feed info (url to download, paths to extract etc) |
the result of carrying out that processing
getTasks(integer $machine_id, array $data = null) : array
Method called from JobController when a MediaUpdater client contacts the name server's web app. This method is supposed to marshal any data on the name server that the requesting client should process.
integer | $machine_id | id of client requesting data |
array | $data | any additional info about data being requested |
work for the client to process
putTasks(integer $machine_id, mixed $data) : array
After a MediaUpdater client is done with the task given to it by the name server's media updater, the client contact the name server's web app. The name servers web app's JobController then calls this method to receive the data on the name server
integer | $machine_id | id of client that is sending data to name server |
mixed | $data | results of computation done by client |
any response information to send back to the client
execNameServer(string $command, string $args = null) : array
Executes a method on the name server's JobController.
It will typically execute either getTask or putTask for a specific Mediajob or getUpdateProperties to find out the current MediaUpdater should be configured.
string | $command | the method to invoke on the name server |
string | $args | additional arguments to be passed to the name server |
data returned by the name server.
parsePodcastAuxInfo( $podcast, boolean $test_mode = false)
Used to fill in details for an associative arrays containing the details of a Wiki feed and scrape podcast which should be examined to see if new items should be downloaded to wiki pages. As part of processing expired feed items for the given wiki might be deleted.
$podcast | ||
boolean | $test_mode | if true then does not cull expired feed items from disk, but will return previously downloaded as if it had. |
updatePodcastsOneGo( $podcasts, integer $age = \seekquarry\yioop\configs\ONE_WEEK, boolean $test_mode = false) : mixed
For each of a supplied list of podcast associative arrays, downloads the non-expired media for that podcast to the wiki folder specified.
$podcasts | ||
integer | $age | oldest age items to consider for download |
boolean | $test_mode | if true then rather then updating items in wiki, returns as a string summarizing the results of the downloads that would occur as part of updating the podcast |
either true, or if $test_mode is true then the results as a string of the operations involved in downloading the podcasts
processHtmlPodcast( $podcast, integer $age, boolean $test_mode = false) : array
Used to download the media item associated with an HTML scrape podcast
$podcast | ||
integer | $age | max age of an the media item to be considered for download |
boolean | $test_mode | if true then rather then updating items in wiki, returns as a string summarizing the results of the downloads that would occur as part of updating the podcast |
[whether item downloaded, test_mode_info_string if applicable or "" otherwise]
getLinkFromQueryPage(string $xpath, string $page, string $dom, string $source_url) : string
Used to extract a URL from a pagee either as a string of in dom form and to canonicalize it based on a starting url.
string | $xpath | either an xpath to look into a dom object or a regex to search a page as a string |
string | $page | source page to search in as a string |
string | $dom | source page as a dom object |
string | $source_url | url to use to canonicalize an incomplete url if the extraction only produces part of a url |
desired url link
processFeedPodcast( $podcast, integer $age, boolean $test_mode = false) : mixed
Processes the page contents of one podcast feed. Determines which podcast files on that page are fresh and if a podcast is fresh downloads it and moves it to the appropariate wiki folder.
$podcast | ||
integer | $age | how many seconds ago is still considered a recent enough podcast to process |
boolean | $test_mode | if true then rather then updating items in wiki, returns as a string summarizing the results of the downloads that would occur as part of updating the podcast |
either true, or if $test_mode is true then the results as a string of the operations involved in downloading the podcasts
downloadPodcastItemIfNew(array $item, $podcast, integer $age) : boolean
Given a podcast item from a podcast feed page determines if it has been downloaded or not and if not whether it is recent enough to download. If it is recent enough, it scrapes the file to download and downloads any other intermediate files need to find the file to download, then finally downloads this podcast item. If the podcast item is built out of multiple videos, it concatenates them and makes a single video. It then moves the podcast item to the appropriate wiki folder.
array | $item | an associative array about one item on a podcast feed page |
$podcast | ||
integer | $age | how many seconds ago is still considered a recent enough podcast to process |
whether downloaded or not.
downloadPodcastItem(string $url, string $type = "mp4", array $audiolist_urls = array()) : string
Helper method to @see downloadPodcastItemIfNew called when it is known that a podcast item should be downloaded. It downloads a podcast item.
If the podcast item is an intermediate file pointing to several items to download such as video. It downloads these and concatenates them to makes a single video.
string | $url | of podcast item to download |
string | $type | file type of podcast item |
array | $audiolist_urls | an array of audio urls to download if this has already been obtained |
with podcast item if successful or false otherwise
makeFileNamePattern(string $file_name, string $file_pattern, string $title = "", integer $pubdate = null) : string
Used to construct a filename for a downloaded podcast item suitable to be used when stored in a wiki page's resource folder
string | $file_name | name of file |
string | $file_pattern | string which can contain %F for previous filename, %T for title, and date %date_command, for example, %Y for year, %m for month, %d for day, etc. These will be substituted with their values when wriitng out the wiki name for the downloaded podcast item. |
string | $title | a title string for wiki item |
integer | $pubdate | when the wiki item was published as a Unix timestamp. The value of this is used when computing values for the $file_pattern |
output filename for wiki item