countCompanyLevelDomainsInCommonDetectFarm()
countCompanyLevelDomainsInCommonDetectFarm(string $url, array $links, integer $threshold = 200) : integer
Returns the number of links in the array $links which
which share the same company level domain (cld) as $url
For www.yahoo.com the cld is yahoo.com, for
www.theregister.co.uk it is theregister.co.uk. It is
similar for organizations. It also tries to determine
if a $url is potentially part of a link farm. To do this
it checks (1) if the number of distinct, not sub-locale domains with a
shared company domain is high > $threshold/2. This suggest a lot of bogus
outgoing links that are all under one company's control. For example, a site
www.foo.com linking to md5_hash.foo.com for many different md5 hashes.
If this is detected this method returns -1. This method also returns -1
if (2) there seem to be lots of links ($threshold) from the current
domain to a single domain that shares the same company domain. This
might indicate a domain md5_hash.foo.com with lots of links to a domain
www.foo.com
Parameters
string |
$url |
the url to compare against $links |
array |
$links |
an array of urls |
integer |
$threshold |
number above which if either situation (1) or (2)
above happens then deem site spam |
Returns
integer
— the number of times $url shares the cld with a
link in $links. If thinks part of link farm returns -1