I have created a new summarizer called centroid weighted summarizer. The new summarizer has been integrated in the Yioop system as well. With that I added some code to detect Content Management Systems (CMS) like Wordpress. The html processor is using its detection code for each site it sees from now on. The CMS detectors also have tests to validate their detection. I updated a few of the existing summarizers to allow them to output their summaries for automating ROUGE tests.

Author diesal9 <charles.bocage@sjsu.edu>
Author date 2015-10-23 05:Oct:rd
Author local date 2015-10-22 22:Oct:nd -0700
Committer Chris Pollett <chris@pollett.org>
Committer date 2015-11-09 20:Nov:th
Committer local date 2015-11-09 12:Nov:th -0800
Commit 9613beb55473047d740f593ef598e65b3fc381a3
Tree 468db3cf2b20f64dde2860b9f026484d099482a2
Parent 4d08150529fc6f042d582ab243470d8344279a45
I have created a new summarizer called centroid weighted summarizer. The new summarizer has been integrated in the Yioop system as well. With that I added some code to detect Content Management Systems (CMS) like Wordpress. The html processor is using its detection code for each site it sees from now on. The CMS detectors also have tests to validate their detection. I updated a few of the existing summarizers to allow them to output their summaries for automating ROUGE tests.

Signed-off-by: Chris Pollett <chris@pollett.org>
Affected files:
src/controllers/components/CrawlComponent.php
src/executables/Fetcher.php
src/library/CrawlConstants.php
src/library/cmsdetector/CmsBase.php
src/library/cmsdetector/CmsDetector.php
src/library/cmsdetector/DrupalFrameworkDetector.php
src/library/cmsdetector/WordpressFrameworkDetector.php
src/library/cmsdetector/simple_html_dom.php
src/library/processors/HtmlProcessor.php
src/library/processors/PageProcessor.php
src/library/processors/TextProcessor.php
src/library/summarizers/CentroidSummarizer.php
src/library/summarizers/CentroidWeightedSummarizer.php
src/library/summarizers/GraphBasedSummarizer.php
src/library/summarizers/ScrapeSummarizer.php
src/library/summarizers/Summarizer.php
src/locale/en_US/configure.ini
tests/CmsDetectorTest.php
tests/test_files/cms_detector/Drupal01.txt
tests/test_files/cms_detector/Drupal02.txt
tests/test_files/cms_detector/Drupal03.txt
tests/test_files/cms_detector/Drupal04.txt
tests/test_files/cms_detector/Wordpress01.txt
tests/test_files/cms_detector/Wordpress02.txt
tests/test_files/cms_detector/Wordpress03.txt
tests/test_files/cms_detector/Wordpress04.txt
tests/test_files/cms_detector/cms_detector_input.txt
tests/test_files/cms_detector/cms_detector_results.txt
ViewGit