AnalyticsManager |
Used to set and get SQL query and search query timing statistic
between models and index_bundle_iterators |
BloomFilterBundle |
A BloomFilterBundle is a directory of BloomFilterFile. |
BloomFilterFile |
Code used to manage a bloom filter in-memory and in file. |
BrowserRunner |
Used to execute browser-based Javascript and browser page rendering from PHP. |
BTNode |
Class for B-Tree nodes |
BTree |
This class implements the B-Tree data structure for storing int key based
key-value pairs based on the algorithms in Introduction To Algorithms,
by T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein. Second
Edition, 2001, The MIT Press |
BZip2BlockIterator |
This class is used to allow one to iterate through a Bzip2 file. |
ComputerVision |
Class used to encapsulate verious methods related to computer
vision that might be useful for indexing documents. These
include recognizing text in images |
CrawlDaemon |
Used to run scripts as a daemon on *nix systems |
DoubleIndexBundle |
A DoubleIndexBundle encapsulates and provided methods for two
IndexArchiveBundle used to store a repeating crawl. One one thse bundles
is used to handle current search queries, while the other is used to store
an ongoing crawl, once the crawl time has been reach the roles of the two
bundles are swapped |
FeedArchiveBundle |
Subclass of IndexArchiveBundle with bloom filters to make it easy to check
if a news feed item has been added to the bundle already before adding it |
FetchGitRepositoryUrls |
Library of functions used to fetch Git internal urls |
FetchUrl |
Code used to manage HTTP or Gopher requests from one or more URLS |
FileCache |
Library of functions used to implement a simple file cache |
HashTable |
Code used to manage a memory efficient hash table
Weights for the queue must be flaots |
IndexArchiveBundle |
Encapsulates a set of web page summaries and an inverted word-index of terms
from these summaries which allow one to search for summaries containing a
particular word. |
IndexDictionary |
Data structure used to store for entries of the form:
word id, index shard generation, posting list offset, and length of
posting list. It has entries for all words stored in a given
IndexArchiveBundle. There might be multiple entries for a given word_id
if it occurs in more than one index shard in the given IndexArchiveBundle. |
IndexManager |
Class used to manage open IndexArchiveBundle's while performing
a query. Ensures an easy place to obtain references to these bundles
and ensures only one object per bundle is instantiated in a Singleton-esque
way. |
JavascriptUnitTest |
Super class of all the test classes testing Javascript functions. |
Library |
A class used to ensure can autoload non utility and locale function when
using Yioop as a composer library. Also let's one set the debug level |
LinearAlgebra |
Class useful for handling linear algebra operations on associative array
with key => value pairs where the value is a number. |
MailServer |
A small class for communicating with an SMTP server. Used to avoid
configuration issues that might be needed with PHP's built-in mail()
function. Here is an example of how one might use this class: |
Mod9Constants |
Mini-class (so not own file) used to hold encode decode info related to
Mod9 encoding (as variant of Simplified-9 specify to Yioop). |
NamedEntityContextTagger |
Machine learning based named entity recognizer. |
NWordGrams |
Library of functions used to create and extract n word grams |
PageRuleParser |
Has methods to parse user-defined page rules to apply documents
to be indexed. |
PartialZipArchive |
Used to extract files from an initial segment or a fragment of a
ZIP Archive. |
PersistentStructure |
A PersistentStructure is a data structure which every so many operations
will be saved to secondary storage (such as disk). |
PhraseParser |
Library of functions used to manipulate words and phrases |
PriorityQueue |
Code used to manage a memory efficient priority queue. |
ScraperManager |
Class used by html processors to detect if a page matches a particular
signature such as that of a content management system, and
also to provide scraping mechanisms for the content of such a page |
StringArray |
Memory efficient implementation of persistent arrays |
SuffixTree |
Data structure used to maintain a suffix tree for a passage of words. |
Trie |
Implements a trie data structure which can be used to store terms read
from a dictionary in a succinct way |
UnitTest |
Base class for all the SeekQuarry/Yioop engine Unit tests |
UrlParser |
Library of functions used to manipulate and to extract components from urls |
VersionManager |
VersionManager can be used to create and manage versions of files in a folder
so that a user can revert the files to any version desired back to the
time the folder under manager was first managed. It is used by Yioop's
Wiki system to handle versions of image and other media resources for a
Wiki page. |
WebArchive |
Code used to manage web archive files |
WebArchiveBundle |
A web archive bundle is a collection of web archives which are managed
together.It is useful to split data across several archive files rather than
just store it in one, for both read efficiency and to keep filesizes from
getting too big. In some places we are using 4 byte int's to store file
offsets which restricts the size of the files we can use for wbe archives. |
WebQueueBundle |
Encapsulates the data structures needed to have a queue of to crawl urls |
WikiParser |
Class with methods to parse mediawiki documents, both within Yioop, and
when Yioop indexes mediawiki dumps as from Wikipedia. |