\seekquarry\yiooptests

Classes

BloomFilterFileTest Used to test that the BloomFilterFile class provides the basic functionality of a persistent set. I.e., we can insert things into it, and we can do membership testing
BTreeTest Yioop B-tree Unit Class
DeTokenizerTest Code used to test the German stemming algorithm.
DocxProcessorTest UnitTest for the DocxProcessor class. It is used to process docx files which are a zip of an xml-based format
EnTokenizerTest Code used to test the English stemming algorithm. The inputs for the algorithm are words in http://snowball.tartarus.org/algorithms/porter/voc.txt and the resulting stems are compared with the stem words in http://snowball.tartarus.org/algorithms/porter/output.txt Code uses orginal Porter stemmer, not Porter 2
EpubProcessorTest UnitTest for the EpubProcessor class. An EpubProcessor is used to process a .epub (ebook publishing standard) file and extract summary from it. This class tests the processing of an .epub file format by EpubProcessor.
EsTokenizerTest Code used to test the French stemming algorithm. The inputs for the algorithm are words in http://snowball.tartarus.org/algorithms/spanish/voc.txt and the resulting stems are compared with the stem words in http://snowball.tartarus.org/algorithms/spanish/output.txt
FaTokenizerTest Code used to test the Persian stemming algorithm. The inputs for the algorithm came from the sample text file for the Hamshahri Collection found at http://ece.ut.ac.ir/DBRG/Hamshahri/download.html The stemmed results come from the Java program that the PHP stemmer is based off of at http://members.unine.ch/jacques.savoy/clef/persianStemmerArabic.txt
FetchUrlTest Used to test auxiliary functions related to downloading pages with the FetchUrl class.
FrTokenizerTest Code used to test the French stemming algorithm. The inputs for the algorithm are words in http://snowball.tartarus.org/algorithms/french/voc.txt and the resulting stems are compared with the stem words in http://snowball.tartarus.org/algorithms/french/output.txt
HashTableTest Used to test that the HashTable class properly stores key value pairs, handles insert, deletes, collisions okay. It should also detect when table is full
HiTokenizerTest Code used to test the Hindi stemming algorithm. The inputs for the algorithm came from the sample text file for the The stemmed results come from the Java program that the PHP stemmer is based off of at http://members.unine.ch/jacques.savoy/clef/HindiStemmerLight.java.txt which has since been modified to try to improve accuracy
IndexDictionaryTest Used to test that the IndexDictionary class can properly add shards and retrieve correct posting slice ranges in the shards.
IndexShardTest Used to test that the IndexShard class can properly add new documents and retrieve those documents by word. Checks that doc offsets can be updated, shards can be saved and reloaded
ItTokenizerTest My code for testing the Italian stemming algorithm. The inputs for the algorithm are words in http://snowball.tartarus.org/algorithms/italian/voc.txt and the resulting stems are compared with the stem words in http://snowball.tartarus.org/algorithms/italian/output.txt
NlTokenizerTest Code used to test the Dutch stemming algorithm. The inputs for the algorithm are words in http://snowball.tartarus.org/algorithms/Dutch/voc.txt and the resulting stems are compared with the stem words in http://snowball.tartarus.org/algorithms/Dutch/output.txt
PdfProcessorTest UnitTest for the PdfProcessor class. A PdfProcessor is used to process a .pdf file and extract summary from it. This class tests the processing of an .pdf file.
PhantomjsUiTest Used to test the UI using PhantomJs.
PhraseParserTest Used to test that the PhraseParser class. Want to make sure bigram extracting works correctly
PptxProcessorTest UnitTest for the PptxProcessor class. It is used to process pptx files which are a zip of an xml-based format
PriorityQueueTest Used to test the PriorityQueue class that is used to figure out which URL to crawl next
PtTokenizerTest Code used to test the Portuguese stemming algorithm. The inputs for the algorithm are words in http://snowball.tartarus.org/algorithms/porter/voc.txt and the resulting stems are compared with the stem words in http://snowball.tartarus.org/algorithms/porter/output.txt Code uses orginal Porter stemmer, not Porter 2
QueueServerTest Used to test functions related to scheduling websites to crawl for a web crawl (the responsibility of a QueueServer)
RuTokenizerTest Code used to test the Russian stemming algorithm. The inputs for the algorithm are words in http://snowball.tartarus.org/algorithms/russian/voc.txt and the resulting stems are compared with the stem words in http://snowball.tartarus.org/algorithms/russian/output.txt
ScraperManagerTest Code used to test Web Scrapers.
Sha1JavascriptTest Used to test the Javascript implementation of the sha1 function.
StringArrayTest Used to test that the StringArray class properly stores/retrieves values, and can handle loading and saving
TrieTest Used to test that the Trie class properly stores words that could be used for an autosuggest dictionary
UrlParserTest Used to test that the UrlParser class. For now, want to see that the method canonicalLink is working correctly and that isPathMemberRegexPaths (used in robot_processor.php) works
UtilityTest Used to test the various methods in utility, in particular, those related to posting lists and time.
VersionManagerTest UnitTests for the VersionManager class.
WebArchiveTest UnitTest for the WebArchive class. A web archive is used to store array-based objects persistently to a file. This class tests storing and retrieving from such an archive.
WebQueueBundleTest UnitTest for the WebQueueBundle class.
XlsxProcessorTest Used to test that the XlsxProcessor class provides the basic functionality of getting the tile, description, languages and links

Functions

getScraperEntries()

getScraperEntries() 

This function has an array of Web Scrapers.