Constants

INDICATOR_NONE

INDICATOR_NONE

An indicator to tell no actions to be taken

INDICATOR_GIT

INDICATOR_GIT

An indicator to indicate git repository

GIT_URL_CONTINUE

GIT_URL_CONTINUE

An indicator to tell more git urls need to be fetched

GIT_BASE_URL_START

GIT_BASE_URL_START

An indicator to tell starting position of Git url to be used

GIT_BASE_URL_END

GIT_BASE_URL_END

An indicator to tell ending position of Git url to be used

GIT_URL_EXTENSION

GIT_URL_EXTENSION

A fixed component to be used with Git base url to form Git first url

GIT_URL_OBJECT

GIT_URL_OBJECT

A fixed component to be used with Git urls to get next Git urls

GIT_BASE_URL_END_POSITION

GIT_BASE_URL_END_POSITION

A fixed indicator used to get last letter of git base url

GIT_BASE_END_LETTER

GIT_BASE_END_LETTER

A fixed indicator used to get last letter of git base url

GIT_NEXT_URL_START

GIT_NEXT_URL_START

A fixed position used to indicate starting point to fetch next Git url from the master file

GIT_NEXT_URL_END

GIT_NEXT_URL_END

A fixed position used to indicate ending position to fetch next Git url from the master file

GIT_URL_SPLIT

GIT_URL_SPLIT

A fixed indicator used to make desired Git folder structure from SHA hash

GIT_MASTER_TREE_HASH_START

GIT_MASTER_TREE_HASH_START

A fixed indicator used to mark starting position of SHA hash of Git master tree

GIT_MASTER_TREE_HASH_END

GIT_MASTER_TREE_HASH_END

A fixed indicator used to mark ending position of SHA hash of Git master tree

GIT_FOLDER_NAME_START

GIT_FOLDER_NAME_START

A fixed indicator used to mark starting position of SHA hash used to indicate Git object folder

GIT_FOLDER_NAME_END

GIT_FOLDER_NAME_END

A fixed indicator used to mark ending position of SHA hash used to indicate Git object folder

GIT_FILE_NAME_START

GIT_FILE_NAME_START

A fixed indicator used to mark starting position of SHA hash used to indicate Git object file

GIT_FILE_NAME_END

GIT_FILE_NAME_END

A fixed indicator used to mark ending position of SHA hash used to indicate Git object file

GIT_BLOB_OBJECT

GIT_BLOB_OBJECT

A fixed indicator used to indicate Git blob object

GIT_TREE_OBJECT

GIT_TREE_OBJECT

A fixed indicator used to indicate Git tree object

CURL_TIMEOUT

CURL_TIMEOUT

A cURL time out parameter

CURL_TRANSFER

CURL_TRANSFER

A cURL transfer parameter

BLOB_ACCESS_CODE_START

BLOB_ACCESS_CODE_START

Git blob access code starting position

BLOB_ACCESS_CODE_END

BLOB_ACCESS_CODE_END

Git blob access code ending position

TREE_ACCESS_CODE_START

TREE_ACCESS_CODE_START

Git tree access code starting position

TREE_ACCESS_CODE_END

TREE_ACCESS_CODE_END

Git tree access code ending position

SHA_HASH_BINARY_START

SHA_HASH_BINARY_START

Git SHA hash binary starting position

SHA_HASH_BINARY_END

SHA_HASH_BINARY_END

Git SHA hash binary ending position

GIT_NAME_START

GIT_NAME_START

A indicator for starting of Git file or folder name

GIT_BLOB_NEXT

GIT_BLOB_NEXT

A indicator to represent next position after the access code in Git blob object

GIT_TREE_NEXT

GIT_TREE_NEXT

A indicator to represent next position after the access code in Git tree object

HEX_NULL_CHARACTER

HEX_NULL_CHARACTER

A indicator to represent next position after the access code in Git tree object

GIT_BLOB_INDICATOR

GIT_BLOB_INDICATOR

A indicator to represent that a git file is a blob file

GIT_TREE_INDICATOR

GIT_TREE_INDICATOR

A indicator to represent that a git file is a tree file

Properties

$repository_types

$repository_types : array

A list of meta words that might be extracted from a query

Type

array

$all_git_urls

$all_git_urls : array

An array used to store all the Git internal urls

Type

array

Methods

checkForRepository()

checkForRepository(string  $extension) : string

Checks repository type based on extension

Parameters

string $extension

to check

Returns

string —

$repository_type repository type based on the extension of urls

setGitRepositoryUrl()

setGitRepositoryUrl(string  $url_to_check, integer  $counter, array  $seeds, array  $repository_indicator, array  $site_value, integer  $total_git_urls, array  $all_git_urls) : array

Sets up the seed sites with urls from a git repository (updates these sites if have already started downloading from repository)

Parameters

string $url_to_check

url needs to be processed

integer $counter

to keep track of number of urls processed

array $seeds

store sites which are ready to be downloaded

array $repository_indicator

indicates the type of the repository

array $site_value

contains original Git url crawled

integer $total_git_urls

number of urls in repository less those already processed

array $all_git_urls

current list of urls from git repository

Returns

array —

$git_internal_urls containing all the internal Git urls fetched from the parent Git url

fetchGitRepositoryUrl()

fetchGitRepositoryUrl(string  $url_to_check) : \seekquarry\yioop\library\an

Get the Git internal urls from the parent Git url

Parameters

string $url_to_check

url needs to be processed

Returns

\seekquarry\yioop\library\an —

array $git_next_urls consists of list of Git internal urls wich are called during the git clone

getGitMasterFile()

getGitMasterFile(string  $git_first_url_content, string  $git_base_url) : string

Get the Git second url which points to Git master tree structure

Parameters

string $git_first_url_content

contents of Git first url

string $git_base_url

common portion of Git urls

Returns

string —

$git_next_url consists of second internal Git url

getGitMasterTree()

getGitMasterTree(string  $git_second_url_content, string  $git_base_url) : string

Get the Git third url which contains the information about the organization of entire git repository

Parameters

string $git_second_url_content

contents of Git second url

string $git_base_url

common portion of git urls

Returns

string —

$git_next_url consists of third internal git url

getNextGitUrl()

getNextGitUrl(string  $git_url, string  $compression_indicator) : string

Get the Git content from url which will be used to get the next git url

Parameters

string $git_url

git url to extract contents from it

string $compression_indicator

indicator for compress and uncompress contents

Returns

string —

$git_object_content consists contents extracted from the url

getObjects()

getObjects(string  $git_object_content, string  $git_base_url) : array

Get the Git blob and tree objects

Parameters

string $git_object_content

compressed content of git master tree file

string $git_base_url

common content of git url

Returns

array —

$blob_url contains information and url for git blob objects

checkPosition()

checkPosition(string  $git_blob_position, string  $git_tree_position, string  $git_object_content) : array

checks the position of access code for null values

Parameters

string $git_blob_position

first occuence of git blob access code

string $git_tree_position

first occuence of git tree access code

string $git_object_content

compressed content of git master tree

Returns

array —

$git_object_positions length of the compressed content afterthe access code

readBlobSha()

readBlobSha(string  $git_object_content, string  $blob_position, string  $length, string  $git_base_url) : array

Get the details of the blob file i.e blob file name, sha hash and content

Parameters

string $git_object_content

compressed content of git master tree

string $blob_position

first occuence of git blob access code in $content

string $length

length of the compressed content of git master tree

string $git_base_url

common portion of git url

Returns

array —

$git_blob_content contains details of git blob object

readTreeSha()

readTreeSha(string  $git_object_content, string  $tree_position, string  $length, string  $git_base_url) : array

Get the details of the tree file i.e folder name, sha hash and blob url inside the tree

Parameters

string $git_object_content

compressed content of git master tree

string $tree_position

first occuence of git tree access code in the $content

string $length

length of the compressed content of git master tree

string $git_base_url

common portion of git url

Returns

array —

$git_tree_content contains details of git blob object

checkNestedStructure()

checkNestedStructure(string  $sha_hash, string  $git_base_url) : string

Checks the nested structure inside git tree object

Parameters

string $sha_hash

sha of the git tree object

string $git_base_url

common portion of the parent git url

Returns

string —

$blob_url contains url of the blob file inside the folder

urlMaker()

urlMaker(string  $sha_hash, string  $git_base_url) : string

Makes the git clone internal url for blob objects

Parameters

string $sha_hash

of the git blob object

string $git_base_url

common portion of git url

Returns

string —

$git_object_url contains the complete url of the blob file

getGitData()

getGitData(string  $git_url) : string

Makes the cURL call to get the contents

Parameters

string $git_url

url to dowmload the contents

Returns

string —

$git_content actual content of the git url