Changelog

Changes in Version 9.4

(Aug 30, 2023)
  • Crawler and Search Engine
    • Sliding window used to handle messages between Fetcher and QueueServer.
    • Fixes and Expect header bug in downloading webpages
  • Admin, Group, Wiki, and Yioop Interface
    • Add a new moderation group and the ability to flag posts
    • Adds a mechanism for the owner of a group to move posts between threads
    • Improves UI for change passwords, password recovery
    • Fixes a bug in how the autologout mechanisms was implemented
    • Fixes PHP 8.2 deprecations

Changes in Version 9.3

(Dec 21, 2022)
  • Crawler and Search Engine
    • Improve efficiency of query impression collection.
  • Admin, Group, Wiki, and Yioop Interface
    • Add new description source to search sources allowing one to use wiki folder or resource names to be used to download from the internet a description of the resource. Associated to this resource is a new DescriptionUpdateJob used by the media updater.
    • Fixes a bug where public wiki pages could be reverted to earlier versions of the page without authorization. (Security)
    • Improves recommendation job by allowing recommendations of wiki resources. It also switches the earlier pure TF-IDF approach to recommendations to a Hash2Vec approach.
    • Fixes to filenames produced by PodcastDownloaderJob and fix notices in TrendingHighlightJob.
    • Add differential privacy so as to allow for the masking of the exact number of users of a group.
    • Fixes issues where WeatherBot had ceased to work.
    • Try to speed up queries on count of groups, messages in groups, etc.

Changes in Version 9.2

(Sep 17, 2022)
  • Crawler and Search Engine
    • Fixes bug that prevented MediaUpdater from working correctly.
  • Admin, Group, Wiki, and Yioop Interface
    • In appearance activity changed auxiliary.css textarea into a more general UI for creating and editing CSS themes. These themes can also be selected on a page-by-page basis for wiki pages.
    • Add the ability to create wiki pages with forms elements on them. Form data is written to a form_data.csv file in page's resources.
    • Adds badges for crawl control and social control buttons. Crawl badges indicate total number of ongoing versus total number of crawls, number of machines, number of crawl mixes. Social badges indicate number of unread messages, number of unread posts to threads, and total number of groups a user belongs to.
    • Update Wiki Syntax page.

Changes in Version 9.1

(August 8, 2022)
  • Crawler and Search Engine
    • Improves Results editor by allowing the Admin role to directly edit results off a search engine result page., rather than go through the menu system.
    • Fixes bugs related to crawling on Windows
    • Improvements to the clarity of the search engine result scoring mechanism, adds additional scoring conrols under Page Options
  • Indexing and Library Functionality
    • Improvements to the layout of search indexes which should hopefully make indexing faster.

Changes in Version 9.0

(July 17, 2022)
  • Crawler and Search Engine
    • Yioop's queueing system changed from being OPIC-based to a logarithmic tiers, company-level-domain, host budgeting mechanism.
    • Search Engine result look-and-feel modernized, favicons for sites, if present, downloaded and appear next to results.
    • Hides additional facts about a result accessible by a ... icon.
    • Image Results page layout updated, results show dimensions of base image. Now possible to search filtering by image color.
    • Video Results page layout updates, videos indicate duration, and video resolution. Possible to search filtering by video resolution.
    • Add links to make it easier for a site admin to edit knowledge wiki callouts on result pages.
    • News result layout updated
    • Code modernization to eliminate PHP 8.1 Notices.
  • Indexing and Library Functionality
    • Now includes ngram entity detection filters for all locales.
    • Tokentool now updated to support Wikimedia's newer data dump formats for page counts.
    • Greek locale support added including a Greek Stemmer.
    • Enhances WordFilterPlugin by adding support for terms of form FILTER_TERM_ and FILTER_LIST_ coming from a Web Scraper.
    • Updates Arctool to support new indexing methods introduced in Version 8.
    • Work to reduce memory leaks in code, separated out dictionary operations into own process.
    • Optimizations for posting unpacking.
    • Profiling tooling of indexing methods to make it easier to do future optimizations.
    • Add additional links to UI to make it easier to switch between crawl and manage machine activities, other links added to make it faster to go from editing a scraper to testing its effects on a page.
    • Resolves some conjunctive query and quoted query bugs.
  • Admin, Group, Wiki, and Yioop Interface
    • Updated UI with icon based links to make it easier to navigate between activities.
    • Updated UI for activities related to creating, finding, and joining groups.
    • Messaging between users now supported.
    • Emoji picker for messages added.
    • Image resources on pages now lazy loaded.
    • Wiki media gallery pages of video resources output Open Graph info about the video.
    • Add PDF thumbnail generation support for wiki resources if ImageMagick installed.
    • Add epub, html thumbnail generation support for wiki resources if Calibre installed.
    • Updates credit system so now possible to redeem credit back to cash.
    • Accessibility color scheme improvements.
    • Improvements to feed and wiki group bar bread crumbs.
    • Hamburger menu animation improved.
    • Removes direct invocation of PDFJS and EPUBJS code, to just use the pdfjs viewer and epubjs-reader instead.
    • Several database backend warnings fixed for users using Mysql or Postgres backends.

Changes in Version 8.0

(September 21, 2021)
  • Crawler and Search Engine
    • Crawler now sends Referer header
    • Generate animated GIF thumbnails for video if using FFMPEG
    • Fix crash issue related to etag processing
    • Improve UI for Search Sources to make it easier to add and test new sources
  • Indexing and Library Functionality
    • New PackedTableTools class to efficiently store integer records for several data structures
    • New LinearHashTable implementation using PackedTableTools records
    • Btree class replaced with BPlusTree using PackedTableTools records
    • WebArchiveBundle class replaced with PartitionDocumentBundle class using PackedTableTools records
    • IndexArchiveBundle format for crawls replaced with new IndexDocumentBundle format based on BPlusTree for inverted index, and PartitionDocumentBundle for storage
    • Arctool updated to new indexing sytem and has a migrate option for migrating old indexes.
    • Fixes Mysql issue because Mysql now reserves GROUPS keyword
  • Admin, Group, Wiki, and Yioop Interface
    • Improvements to hamburger menu, more use of icons.
    • Increase accessibility functionality, improve how works in lynx
    • Revamps UI for media list wiki pages, supporting list, grid, and details views as well as allowing control of sort order.
    • New clipboard functionality for editing Wiki Resources.
    • Add new url shortener and page wall wiki page types
    • Support qr codes in Wiki pages
    • Modification to allow code to work in PHP 8
    • Fix security hole that allowed Public Group to post
    • Bug fixes to Yioop if redirects not on
    • Fixes XSS vulnerability in query string

Changes in Version 7.1

(August 13, 2020)
  • Crawler and Search Engine
    • Make News RSS feeds discoverable via link tag.
    • Fixes a bug in how fetcher selects next queue server when run in multi-queue server mode.
  • Indexing and Library Functionality
    • Adds a normalize function to make sure equivalent simplified and traditional characters handled the same.
    • translate-locale option of TokenTool enhanced to support translating Public and Help wiki pages.
  • Admin, Group, Wiki, and Yioop Interface
    • Reorganization of menus related to groups and wikis.
    • Continuous scroll for group discussions added.
    • Bug fix for the display of PDFs.
    • Rework UI so largely works when Javascript disabled and when Javascript and CSS are disabled.

Changes in Version 7

(June 26, 2020)
  • Crawler and Search Engine
    • Search results can be presented as continuous scroll or as paged results.
    • For privacy tell browser not to display referrer query when clicking on search results.
    • Hamburger menu now adds quick access to narrow search by time, language, video duration, etc.
    • Landing page can be configured to show trending and news highlights from admin specified subsearches.
    • Knowledge Wiki callouts on search results and tool to manually make callouts and another tool to generate them off of Wikipedia dumps.
    • Improved crawl result editor that makes it easier to filter results from searches, edit search snippets, and pin urls at top of search results.
    • Better display of trending news items, query and group statistics. Better charting of Trending News Items.
    • Speed up image loading in news item search results.
    • Improved link farm detection.
    • Which media updater jobs are running can now be controlled from the UI.
    • Adds Tesseract support for OCRing images in PDFs.
    • Improved processing of m3u8 for feed podcast downloads
    • Add slow start parameter which can be used to make sure get a good copy of seed sites before starting general crawl.
    • Fixes critical bug in IP address handling introduced by changes to cURL library. Bug was causing many pages not to get crawled after cURL version changed.
    • Improved logging and log file rotation for crawling jobs.
  • Indexing and Library Functionality
    • Feed item storage moved out of database into more scalable FeedArchiveBundle class.
    • Adds a notion of direction to index bundle iterators so now can scan through posting lists in both a forward (what 6.0.4 had) and backward direction.
    • Improved Trending Calculations
    • New segmentator, named-entity recognizer, part-of-speech taggers for Chinese. Improved Chinese stop words.
    • Chinese language question answering implemented.
    • Fixes issues with Tor and proxy crawling.
  • Admin, Group, Wiki, and Yioop Interface
    • Add configurable cookie consent feature to UI for GDPR.
    • Clean up UI for groups so less weird toggling between group and admin view.
    • Tool to use Yandex translate to quickly add localizing strings for languages other than English.
    • Allow browsers to better cache page content by sending better Cache and Not Modified headers.
    • Reduce mixture of url rewrite in .htaccess versus php. now all urls routed through main index.php and PHP does the rewrite.
    • Database query optimizations to speed up queries needed for display of Manage Accounts, and group and feed pages.
    • More humanly readable simplifications URL links between pages.
    • Removed Sharing and User Specified Crawl Mixes -- now crawl mixes restricted to admin account
    • Remove ZKP authentication.

Changes in Version 6

(June 13, 2019)
  • Crawler and Search Engine
    • Trending keywords now available under More and Tools link.
    • Support for multiple simultaneous crawls by assigning machines to channels and then scheduling crawls to those channels.
    • Support for general repeating crawls. These crawls have a repeat frequency and two indexes: one for searching for crawling and Yioop automatically switches between the two every repeat period.
    • Support for crawling to some fix depth directly rather than using a regex in allowable sites to crawl.
    • Dropdown to allow admins to control how Yioop should follow robots.txt files.
    • Under Page Options can now test how pages will be processed by URI, File Upload, or Direct Input.
    • Safe search check box added to Settings and enabled by default.
    • Fixes issues with HTTP/2 crawling on Linux.
    • Improves Mirror server handling.
    • Removes Memcache support as cache option for search results
       
      
  • Indexing and Library Functionality
    • Width, Height, EXIF, and XMP meta data now indexed for images and media:image-small, media:image-medium, media:image-large meta words added.
    • Improved language and safe website detection. Now also supports mul locale tag.
    • Adds stopWordsRemover method to all supported locales' Tokenizer class.
    • New LinearAlgebra class added to make it easier to do term vector manipulations both for summarizers and in using Yioop as a Library under Composer.
    • All summarizers rewritten. Each sentence for each summarizer now gets a score before being added to summary. This score is also used in ranking search results.
    • A Test link for Search Sources added to allow easy testing if source being correctly downloaded.
    • Adds new Scrape Podcast search source to allow downloading of podcasts to wiki pages.
    • Web Scraper order of application now determined by a priority field.
    • Web Scrapers now enhanced so can now extract fields like THUMB_URL or other meta words, such as for video duration. I.e., replaces functionality that previously only poorly served by video search sources.
    • Removes video search sources from search sources.
    • Add Library class with init method to make it easier to initialize Yioop when used with Composer.
    • Under Page Options have a toggle to control whether phrase extraction rather than just term extraction always done. In most circumstances, not using phrase extraction gives faster and better indexing.
    • Remove two copies of dictionary info, one in IndexShard and one in IndexDictionary, thus, making for smaller indexes.
    • Cache pages now stored with summary in same object allowing more compression if keeping cache of whole pages
    • Removes materialized metas and largely unused thesaurus functionality.
  • Group and Wiki System
    • Adds a seen media indicator in media lists, which can be user reset.
    • Improved inter-group links.
    • If wiki url has 360 in path, checks for 360 images and adds an enter VR button to view them.
    • Media updater now has a job that allows periodic downloading of podcasts to a wiki page.
    • Time zone, Cookie name, and Session token now set under Security rather than Appearance, time before autologout now controllable by admin using dropdown.

Changes in Version 5

(May 30, 2018)
  • Crawler and Search Engine
    • Now runs without the need of a separate web server.
    • Improved robots.txt handling
    • Curl flags for HTTP/2
    • Start crawls has been simplified so auto-starts queue server and fetchers
  • Monetization
    • Credits can be used to buy access to wiki groups in addition to keyword ads
    • Group creators can charge credits for joining wiki groups.
  • Group and Wiki System
    • Group and thread recommendation system added
    • Analytic system supports differential privacy
    • Support for encrypted groups
    • Simplified Chat Bot support
    • Can display CSV files as spreadsheets with equation support
    • Can displays embed CSV cells in wiki page as tables or charts
    • Supports data url resources in Wiki pages
    • Video subtitling support enabled using VTT files

Changes in Version 4

(Feb 28, 2016)
  • User-defined Web Scrapers : These can be used to detect web pages of a certain type, for example, coming from Wordpress, and then when generating a summary, focus only on the standard area in such a document where content is found.
  • New Summarizers : Two new summarizers available to pick what content on a page should be indexed: A weighted-centroid approach and a page-rank-like, sentence-graph approach.
  • Media Sources Enhancements : media sources can be used to say what content should be crawled periodically. Before these were aggregated under a news search source and allowed for rss, atom, and scraped html feed types. A media keyword field has been added so one can aggregate these data into user-defined search sources besides news, and a new regex scraper feed type has been added.
  • Image Thumbnail Caching System : allows news feeds results to be completely over SSL.
  • Question Answer System for English : for some question queries Yioop will now list a possible question answer in the returned search results. This relies on a improved part-of-speech tagger for English. A Hindi part-of-speech is now also part of Yioop.
  • General Stability Enhancements for Fetchers and QueueServers
  • Analytics : a subsystem to give information about query counts as well as wiki page, and discussion group views has now been added.
  • Differential Privacy : a module which can be turned on or off to use differential privacy techniques to add randomness to aggregate statistics from the analytics system to reduce the risk that individual user privacy is compromised.
  • Better Management of Users and groups
  • Wiki Page Enhancements : Adds two new page types: Templates: Wiki-feed pages. New dropdown-based navigation. Allows for user-defined relationships between pages and has a simple graphical way to explore relationships. Gallery pages can now display in browse mode: Epub and PDF files in addition to images, audio and video files. Tools to move pages media resources around have also been improved.
  • Group Discussion Enhancements : can import RSS feeds of discussions from other discussion boards.
  • Chat Bots : A first implementation of a chat bot api for the group discussion feeds. Weatherbot example provided.

Changes in Version 3.1

(Sep. 24, 2015)
  • Adds support for Keyword Advertising and its own unique ad keyword pricing model. Findcan.ca demonstrates this in action and now supports sign up for advertisements.
  • The keyword advertising system integrates with a payment processing script available for download for a fee. This script uses Stripe.com to handle credit card transactions.
  • Yioop has been rewritten to work with the popular PHP package manager known as Composer and Yioop is available from the composer package repository https://packagist.org. This should make it easier for people to develop projects using Yioop's natural language processing facilities.
  • Yioop's MediaUpdater process has been rewritten so that it can run in a distributed fashion and now supports recoding to mp4 videos uploaded to the wiki system and group feed system. It also supports sending out notification emails. The latter had been done exclusively by the web app.
  • In addition to the centroid-based and ad-hoc web page summarizers, there is new a new graph-based summarizer that can be used during crawling.
  • Arabic, Dutch, Hindi, Persian, and Portuguese stemmers have been added.

Changes in Version 2.1

(Mar 1, 2015)
  • Fixes some security issues in Version 2.0 with regard to checking allowed activities of a user.
  • Improves the accuracy of how Yioop counts the number of documents containing a word or phrase
  • Improves email notifications from group feeds.
  • Adds number of groups column to manage user lists
  • Adds number of users info to manage groups lists
  • Fixes a number of places where the Yioop code was generating Notices.

Changes in Version 2

(Jan. 25, 2015)
  • General
    • New integrated wiki help system throughout software
  • Search and Crawling
    • Adds Docx support. Now for zipped formats like Office, Yioop can use a partial Zip extractor to extract content even if whole file not downloaded.
    • Adds support for rel canonical meta tag
    • Adds French, Spanish, German, Russian stemmers
    • Adds support for Gopher protocol
    • Word filter plugin can apply domain and url specific rules
    • Improved scheduling of page download based on number of DNS lookups
    • Improved handling of robots.txt files when site in question is congested
    • arc_tool supports count recalculation and url suggestion injection
  • News/Media Updater
    • Updater now has a scraper for HTML pages with news
    • Updater can now extract images from news feeds.
    • Updater can now auto-convert video files to mp4 and webm
  • Feeds and Wikis
    • Adds ability to drag and drop images, video, and other documents in posts and wiki pages
    • Besides standard wiki pages, Yioop now supports slide presentation pages, media gallery pages, and page aliases
    • +/- Voting available on Group posts
    • Can configure so that posts expire after a certain amount of time
    • Improved mail messages associated with posting
    • Can set meta tag info for wiki pages as well as common header and footers
    • Can embed search bars into wiki pages
    • Math Mode works in posts and wiki pages
  • Web Sites
    • Can use Configure activity to set sites look and feel from icons to background to timezone, etc.
    • Can Configure to use wiki system for default landing page
    • GUI for adding Ad Server scripts

Version 1

  • Version appeared Jun 14, 2014

Origins Of Yioop

  • July 10, 2011, git repository started.
  • Software was first released publicly (non-repo based), August, 2010.
  • November, 2009, project begun.