Class: UrlParser
Source Location: /lib/url_parser.php
Library of functions used to manipulate and to extract components from urls
Author(s):
|
|
Class Details
Class Methods
static method canonicalLink [line 591]
static string canonicalLink(
string
$link, string
$site, [string
$no_fragment = true])
|
|
Given a $link that was obtained from a website $site, returns a complete URL for that link. For example, the $link some_dir/test.html on the $site http://www.somewhere.com/bob would yield the complete url http://www.somewhere.com/bob/some_dir/test.html
Tags:
Parameters:
static method checkRecursiveUrl [line 699]
static bool checkRecursiveUrl(
string
$url, [int
$repeat_threshold = 3])
|
|
Checks if a url has a repeated set of subdirectories, and if the number of repeats occurs more than some threshold number of times A pattern like bob/.../bob counts as own reptition. bob/.../alice/.../bob/.../alice would count as two (... should be read as ellipsis, not a directory name).If the threshold is three and there are at least three repeated mathes this function return true; it returns false otherwise.
Tags:
Parameters:
static method cleanRedundantLinks [line 825]
static array cleanRedundantLinks(
array
$links, string
$parent_url)
|
|
Used to delete links from array of links $links based on whether they are the same as the site they came from (or otherwise judged irrelevant)
Tags:
Parameters:
static method getDocumentFilename [line 518]
static string getDocumentFilename(
string
$url)
|
|
Gets the filename portion of a url if present; otherwise returns "Some File"
Tags:
Parameters:
static method getDocumentType [line 490]
static string getDocumentType(
string
$url)
|
|
Given a url, makes a guess at the file type of the file it points to
Tags:
Parameters:
static method getFragment [line 561]
static string getFragment(
string
$url)
|
|
Get the url fragment string component of a url
Tags:
Parameters:
static method getHost [line 116]
static the getHost(
string
$url, [
$with_login_and_port = true], bool
$with_login)
|
|
Get the host name portion of a url if present; if not return false
Tags:
Parameters:
static method getHostPaths [line 299]
static array getHostPaths(
string
$url)
|
|
Gets an array of prefix urls from a given url. Each prefix contains at least the the hostname of the the start url http://host.com/b/c/ would yield http://host.com/ , http://host.com/b, http://host.com/b/, http://host.com/b/c, http://host.com/b/c/
Tags:
Parameters:
static method getHostSubdomains [line 336]
static array getHostSubdomains(
string
$url)
|
|
Gets the subdomains of the host portion of a url. So http://a.b.c/d/f/ will return a.b.c, .a.b.c, b.c, .b.c, c, .c
Tags:
Parameters:
static method getLang [line 158]
static the getLang(
string
$url)
|
|
Attempts to guess the language tag based on url
Tags:
Parameters:
static method getPath [line 267]
static the getPath(
string
$url, [bool
$with_query_string = false])
|
|
Get the path portion of a url if present; if not return NULL
Tags:
Parameters:
static method getQuery [line 543]
static string getQuery(
string
$url)
|
|
Get the query string component of a url
Tags:
Parameters:
static method getWordsIfHostUrl [line 418]
static string getWordsIfHostUrl(
string
$url)
|
|
Given a url, extracts the words in the host part of the url provided the url does not have a path part more than / . Ignores a leading www and also ignore tld. For example, "http://www.yahoo.com/" returns " yahoo "
Tags:
Parameters:
static method getWordsLastPathPartUrl [line 454]
static string getWordsLastPathPartUrl(
string
$url)
|
|
Given a url, extracts the words in the last path part of the url For example, http://us3.php.net/manual/en/function.array-filter.php yields " function array filter "
Tags:
Parameters:
static method hasHostUrl [line 102]
static bool hasHostUrl(
string
$url)
|
|
Checks if the url has a host part.
Tags:
Parameters:
static method isLocalhostUrl [line 727]
static bool isLocalhostUrl(
string
$url)
|
|
Checks if a $url is on localhost
Tags:
Parameters:
static method isPathMemberRegexPaths [line 365]
static bool isPathMemberRegexPaths(
string
$path, array
$robot_paths)
|
|
Checks if $path matches against any of the Robots.txt style regex paths in $paths
Tags:
Parameters:
static method isSchemeHttpOrHttps [line 56]
static bool isSchemeHttpOrHttps(
string
$url)
|
|
Checks if the url scheme is either http or https.
Tags:
Parameters:
static method isVideoUrl [line 798]
static bool isVideoUrl(
&$url, array
$video_prefixes, string
$url)
|
|
Checks if a URL corresponds to a known playback page of a video sharing site
Tags:
Parameters:
static method simplifyUrl [line 76]
static string simplifyUrl(
string
$url, [int
$max_len = 0])
|
|
Converts a url with a scheme into one without. Also removes trailing slashes from url. Shortens url to desired length by inserting ellipsis for part of it if necessary
Tags:
Parameters:
static method urlMemberSiteArray [line 759]
static mixed urlMemberSiteArray(
string
$url, array
$site_array, [bool
$return_rule = false])
|
|
Checks if the url belongs to one of the sites listed in site_array Sites can be either given in the form domain:host or in the form of a url in which case it is check that the site url is a substring of the passed url.
Tags:
Parameters:
|
|