\seekquarry\yioop\library\media_jobsRecommendationJob

Recommendation Job recommends the trending threads as well as threads and groups which are relevant based on the users viewing history

Subclasses should implement methods they use among init(), checkPrerequisites(), nondistributedTasks(), prepareTasks(), finishTasks(), getTasks(), doTasks(), and putTask(). MediaUpdating can be configured to run in either distributed or nameserver only mode. In the former mode, prepareTasks(), finishTasks() run on the name server, getTasks() and putTask() run in the name server's web app, and doTasks() run on any MediaUpdater clients. In the latter mode, only the method nondistributedTasks() is called by the MediaUpdater and by only the updater on the name server.

Summary

Methods
Properties
Constants
__construct()
init()
run()
checkPrerequisites()
nondistributedTasks()
prepareTasks()
finishTasks()
doTasks()
getTasks()
putTasks()
execNameServer()
getJobName()
getCurrentMachine()
initializeNewUserRecommendations()
computeThreadGroupRecommendations()
clearIntermediateAndOldRecommendationData()
numberItems()
numberUsers()
computeItemTermFrequencies()
termCount()
computeUserTermFrequencies()
computeUserItemIdf()
tfIdfUsers()
tfIdfItems()
computeUserItemSimilarity()
calculateSimilarityRecommendations()
$controller
$media_updater
$name_server_does_client_tasks
$name_server_does_client_tasks_only
$tasks
$update_time
$active_time
$item_idf
$user_idf
BATCH_SQL_INSERT_NUM
MAX_GROUP_ITEMS
MAX_TERMS
No protected methods found
No protected properties found
N/A
No private methods found
No private properties found
N/A

Constants

BATCH_SQL_INSERT_NUM

BATCH_SQL_INSERT_NUM

Number of inserts to try to group into a single insert statement before execution

MAX_GROUP_ITEMS

MAX_GROUP_ITEMS

Maximum number of group items used in making recommendations

MAX_TERMS

MAX_TERMS

Maximum number of terms used in making recommendations

Properties

$controller

$controller : object

If MediaJob was instantiated in the web app, the controller that instatiated it

Type

object

$media_updater

$media_updater : object

If the MediaJob was instantiated in a MediaUpdater, this is a reference to that updater

Type

object

$name_server_does_client_tasks

$name_server_does_client_tasks : boolean

Whether to run the job's client tasks on the name server in addition to prepareTasks and finishTasks

Type

boolean

$name_server_does_client_tasks_only

$name_server_does_client_tasks_only : boolean

Whether this MediaJob performs name server only tasks

Type

boolean

$tasks

$tasks : array

The most recently received from the name server tasks for this MediaJob

Type

array

$update_time

$update_time : integer

Time in current epoch when analytics last updated

Type

integer

$active_time

$active_time : integer

Used to track what is the active recommendation timestamp

Type

integer

$item_idf

$item_idf : array

Associative array of the number of items a term appears in

Type

array

$user_idf

$user_idf : array

Associative array of the number of user views a term appears in

Type

array

Methods

__construct()

__construct(object  $media_updater = null, object  $controller = null) 

Instiates the MediaJob with a reference to the object that instatiated it

Parameters

object $media_updater

a reference to the media updater that instatiated this object (if being run in MediaUpdater)

object $controller

a reference to the controller that instantiated this object (if being run in the web app)

init()

init() 

Sets up the database connection so can access tables related to recommendations. Initialize timing info related to job.

run()

run() 

Method executed by MediaUpdater to perform the MediaJob. This method shouldn't need to be overriden. Instead, the various callbacks it calls (listed in the class description) wshould be overriden.

checkPrerequisites()

checkPrerequisites() : boolean

Only update if its been more than an hour since the last update

Returns

boolean —

whether its been an hour since the last update

nondistributedTasks()

nondistributedTasks() 

For now analytics update is only done on name server as Yioop currently only supports one DBMS at a time.

prepareTasks()

prepareTasks() 

This method is called on the name server to prepare data for any MediaUpdater clients.

finishTasks()

finishTasks() 

This method is called on the name server to finish processing any data returned by MediaUpdater clients.

doTasks()

doTasks(array  $tasks) : mixed

This method is run on MediaUpdater client with data gotten from the name server by getTasks. The idea is the client is supposed to then this information and if need be send the results back to the name server

Parameters

array $tasks

data that the MediaJob running on a client MediaUpdater needs to process

Returns

mixed —

the result of carrying out that processing

getTasks()

getTasks(integer  $machine_id, array  $data = null) : array

Method called from JobController when a MediaUpdater client contacts the name server's web app. This method is supposed to marshal any data on the name server that the requesting client should process.

Parameters

integer $machine_id

id of client requesting data

array $data

any additional info about data being requested

Returns

array —

work for the client to process

putTasks()

putTasks(integer  $machine_id, mixed  $data) : array

After a MediaUpdater client is done with the task given to it by the name server's media updater, the client contact the name server's web app. The name servers web app's JobController then calls this method to receive the data on the name server

Parameters

integer $machine_id

id of client that is sending data to name server

mixed $data

results of computation done by client

Returns

array —

any response information to send back to the client

execNameServer()

execNameServer(string  $command, string  $args = null) : array

Executes a method on the name server's JobController.

It will typically execute either getTask or putTask for a specific Mediajob or getUpdateProperties to find out the current MediaUpdater should be configured.

Parameters

string $command

the method to invoke on the name server

string $args

additional arguments to be passed to the name server

Returns

array —

data returned by the name server.

getJobName()

getJobName() : string

Gets the class name (less namespace and the word Job ) of the current MediaJob

Returns

string —

name of the current job

getCurrentMachine()

getCurrentMachine() : string

Returns a hash of the url of the current machine based on the value saved to current_machine_info.txt by a machine statuses request

Returns

string —

hash of current machine url

initializeNewUserRecommendations()

initializeNewUserRecommendations() 

Computes recommendations for users who have yet to receive any recommendation of the given type based on what is the most most popular recommendation

computeThreadGroupRecommendations()

computeThreadGroupRecommendations() 

Manages the whole process of computing thread and group recommendations for users. Makes a series of calls to handle parts of this computation before synthesizing the result

clearIntermediateAndOldRecommendationData()

clearIntermediateAndOldRecommendationData() 

Delete all rows from intermediate tables used in the calculation of group and thread recommendations. Also clears any non-active item recommendations

numberItems()

numberItems() : integer

Computes the number of group items

Returns

integer —

number of items

numberUsers()

numberUsers() : integer

Computes the number of users

Returns

integer —

number of users

computeItemTermFrequencies()

computeItemTermFrequencies() 

Computes the term frequencies for individual items (posts) in groups feeds. That is, for each item in each group for each term in that item compute the number of times it appears in that item.

termCount()

termCount(string  $record) : array

Calculates term => frequency pairs for all terms in a supplied string

Parameters

string $record

string of terms

Returns

array —

$term_frequencies associative array term => count

computeUserTermFrequencies()

computeUserTermFrequencies() 

Calculates the term frequencies for users. For each post of the user, how often the user has seen a post with that term

computeUserItemIdf()

computeUserItemIdf(integer  $number_items, integer  $number_users) 

Computes inverse document frequencies for each term for each user and for each item. That is, for a particular term, it will compute the number of times a user used that term in a post/the number of posts by that user and take the log of the result. For items, the idea is similar, for each thread, one calculates the number of posts that the term appeared in/the total number of posts in the thread and take the log of the result.

Parameters

integer $number_items

number of items

integer $number_users

number of users

tfIdfUsers()

tfIdfUsers() 

Calculates the product TF * IDF for users based on the results of @see computeUserItemIdf and @see computeUserTermFrequencies

tfIdfItems()

tfIdfItems() 

Calculates the product TF * IDF for users based on the results of @see computeUserItemIdf and @see computeItemTermFrequencies

computeUserItemSimilarity()

computeUserItemSimilarity() 

Computes the cosine similarity between users and particular threads based on TF*IDF scores and inserts the result into USER_ITEM_SIMILARITY

calculateSimilarityRecommendations()

calculateSimilarityRecommendations(integer  $recommendation_type,   $similarity_sql, integer  $max_recommendations) 

Computes up to $max_recommendations item recommendations of the given type (thread or group) based on query which computes similarity score between a user and a given type.

Parameters

integer $recommendation_type

a config.php constant indicating the type of recommendation to compute

$similarity_sql

query used to determine user similarity scores should output triples: (user_id item_id rating)

integer $max_recommendations

maximum number of recommendations to compute per user