Changes in Version 6
(June 13, 2019)
- Crawler and Search Engine
- Trending keywords now available under More and Tools link.
- Support for multiple simultaneous crawls by assigning machines to channels and then scheduling crawls to those channels.
- Support for general repeating crawls. These crawls have a repeat frequency and two indexes: one for searching for crawling and Yioop automatically switches between the two every repeat period.
- Support for crawling to some fix depth directly rather than using a regex in allowable sites to crawl.
- Dropdown to allow admins to control how Yioop should follow robots.txt files.
- Under Page Options can now test how pages will be processed by URI, File Upload, or Direct Input.
- Safe search check box added to Settings and enabled by default.
- Fixes issues with HTTP/2 crawling on Linux.
- Improves Mirror server handling.
- Removes Memcache support as cache option for search results
- Indexing and Library Functionality
- Width, Height, EXIF, and XMP meta data now indexed for images and media:image-small, media:image-medium, media:image-large meta words added.
- Improved language and safe website detection. Now also supports mul locale tag.
- Adds stopWordsRemover method to all supported locales' Tokenizer class.
- New LinearAlgebra class added to make it easier to do term vector manipulations both for summarizers and in using Yioop as a Library under Composer.
- All summarizers rewritten. Each sentence for each summarizer now gets a score before being added to summary.
This score is also used in ranking search results.
- A Test link for Search Sources added to allow easy testing if source being correctly downloaded.
- Adds new Scrape Podcast search source to allow downloading of podcasts to wiki pages.
- Web Scraper order of application now determined by a priority field.
- Web Scrapers now enhanced so can now extract fields like THUMB_URL or other meta words, such as for video duration.
I.e., replaces functionality that previously only poorly served by video search sources.
- Removes video search sources from search sources.
- Add Library class with init method to make it easier to initialize Yioop when used with Composer.
- Under Page Options have a toggle to control whether phrase extraction rather than just term extraction always done. In most circumstances, not using phrase extraction gives faster and better indexing.
- Remove two copies of dictionary info, one in IndexShard and one in IndexDictionary, thus, making for smaller indexes.
- Cache pages now stored with summary in same object allowing more compression if keeping cache of whole pages
- Removes materialized metas and largely unused thesaurus functionality.
- Group and Wiki System
- Adds a seen media indicator in media lists, which can be user reset.
- Improved inter-group links.
- If wiki url has 360 in path, checks for 360 images and adds an enter VR button to view them.
- Media updater now has a job that allows periodic downloading of podcasts to a wiki page.
- Time zone, Cookie name, and Session token now set under Security rather than Appearance, time
before autologout now controllable by admin using dropdown.
Changes in Version 5
(May 30, 2018)
- Crawler and Search Engine
- Now runs without the need of a separate web server.
- Improved robots.txt handling
- Curl flags for HTTP/2
- Start crawls has been simplified so auto-starts queue server and fetchers
- Credits can be used to buy access to wiki groups in addition to keyword ads
- Group creators can charge credits for joining wiki groups.
- Group and Wiki System
- Group and thread recommendation system added
- Analytic system supports differential privacy
- Support for encrypted groups
- Simplified Chat Bot support
- Can display CSV files as spreadsheets with equation support
- Can displays embed CSV cells in wiki page as tables or charts
- Supports data url resources in Wiki pages
- Video subtitling support enabled using VTT files
Changes in Version 4
(Feb 28, 2016)
- User-defined Web Scrapers : These can be used to detect web pages of a certain type, for example, coming from Wordpress, and then when generating a summary, focus only on the standard area in such a document where content is found.
- New Summarizers : Two new summarizers available to pick what content on a page should be indexed: A weighted-centroid approach and a page-rank-like, sentence-graph approach.
- Media Sources Enhancements : media sources can be used to say what content should be crawled periodically. Before these were aggregated under a news search source and allowed for rss, atom, and scraped html feed types. A media keyword field has been added so one can aggregate these data into user-defined search sources besides news, and a new regex scraper feed type has been added.
- Image Thumbnail Caching System : allows news feeds results to be completely over SSL.
- Question Answer System for English : for some question queries Yioop will now list a possible question answer in the returned search results. This relies on a improved part-of-speech tagger for English. A Hindi part-of-speech is now also part of Yioop.
- General Stability Enhancements for Fetchers and QueueServers
- Analytics : a subsystem to give information about query counts as well as wiki page, and discussion group views has now been added.
- Differential Privacy : a module which can be turned on or off to use differential privacy techniques to add randomness to aggregate statistics from the analytics system to reduce the risk that individual user privacy is compromised.
- Better Management of Users and groups
- Wiki Page Enhancements : Adds two new page types: Templates: Wiki-feed pages. New dropdown-based navigation. Allows for user-defined relationships between pages and has a simple graphical way to explore relationships. Gallery pages can now display in browse mode: Epub and PDF files in addition to images, audio and video files. Tools to move pages media resources around have also been improved.
- Group Discussion Enhancements : can import RSS feeds of discussions from other discussion boards.
- Chat Bots : A first implementation of a chat bot api for the group discussion feeds. Weatherbot example provided.
Changes in Version 3.1
(Sep. 24, 2015)
- Adds support for Keyword Advertising and its own unique ad keyword pricing model. Findcan.ca demonstrates this in action and now supports sign up for advertisements.
- The keyword advertising system integrates with a payment processing script available for download for a fee. This script uses Stripe.com to handle credit card transactions.
- Yioop has been rewritten to work with the popular PHP package manager known as Composer and Yioop is available from the composer package repository https://packagist.org. This should make it easier for people to develop projects using Yioop's natural language processing facilities.
- Yioop's MediaUpdater process has been rewritten so that it can run in a distributed fashion and now supports recoding to mp4 videos uploaded to the wiki system and group feed system. It also supports sending out notification emails. The latter had been done exclusively by the web app.
- In addition to the centroid-based and ad-hoc web page summarizers, there is new a new graph-based summarizer that can be used during crawling.
- Arabic, Dutch, Hindi, Persian, and Portuguese stemmers have been added.
Changes in Version 2.1
(Mar 1, 2015)
- Fixes some security issues in Version 2.0 with regard to checking allowed activities of a user.
- Improves the accuracy of how Yioop counts the number of documents containing a word or phrase
- Improves email notifications from group feeds.
- Adds number of groups column to manage user lists
- Adds number of users info to manage groups lists
- Fixes a number of places where the Yioop code was generating Notices.
Changes in Version 2
(Jan. 25, 2015)
- New integrated wiki help system throughout software
- Search and Crawling
- Adds Docx support. Now for zipped formats like Office, Yioop can use
a partial Zip extractor to extract content even if whole file not downloaded.
- Adds support for rel canonical meta tag
- Adds French, Spanish, German, Russian stemmers
- Adds support for Gopher protocol
- Word filter plugin can apply domain and url specific rules
- Improved scheduling of page download based on number of DNS lookups
- Improved handling of robots.txt files when site in question is congested
- arc_tool supports count recalculation and url suggestion injection
- News/Media Updater
- Updater now has a scraper for HTML pages with news
- Updater can now extract images from news feeds.
- Updater can now auto-convert video files to mp4 and webm
- Feeds and Wikis
- Adds ability to drag and drop images, video, and other documents in posts and wiki pages
- Besides standard wiki pages, Yioop now supports slide presentation pages, media gallery pages, and page aliases
- +/- Voting available on Group posts
- Can configure so that posts expire after a certain amount of time
- Improved mail messages associated with posting
- Can set meta tag info for wiki pages as well as common header and footers
- Can embed search bars into wiki pages
- Math Mode works in posts and wiki pages
- Web Sites
- Can use Configure activity to set sites look and feel from icons to background to timezone, etc.
- Can Configure to use wiki system for default landing page
- GUI for adding Ad Server scripts
- Version appeared Jun 14, 2014
Origins Of Yioop
- July 10, 2011, git repository started.
- Software was first released publicly (non-repo based), August, 2010.
- November, 2009, project begun.