Initial pass at coding guidelines, a=chris

Chris Pollett [2013-01-02 21:Jan:nd]

Initial pass at coding guidelines, a=chris

Filename
en-US/pages/about.thtml
en-US/pages/documentation.thtml
en-US/pages/downloads.thtml
en-US/pages/home.thtml
en-US/pages/install.thtml
en-US/pages/resources.thtml

diff --git a/en-US/pages/about.thtml b/en-US/pages/about.thtml
index 6cff760..5119429 100755
--- a/en-US/pages/about.thtml
+++ b/en-US/pages/about.thtml
@@ -1,21 +1,21 @@
-<h1>About SeekQuarry/Yioop!</h1>
-<p>SeekQuarry is the parent site for <a href="http://www.yioop.com/">Yioop!</a>.
-Both SeekQuarry and Yioop! were written mainly by myself, <a
+<h1>About SeekQuarry/Yioop</h1>
+<p>SeekQuarry is the parent site for <a href="http://www.yioop.com/">Yioop</a>.
+Both SeekQuarry and Yioop were written mainly by myself, <a
 href="http://www.cs.sjsu.edu/faculty/pollett">Chris Pollett</a>. The project
 began in Nov. 2009 and had its first publically available release in August,
 2010.
 </p>

-<h1>The Yioop! and SeekQuarry Names</h1>
+<h1>The Yioop and SeekQuarry Names</h1>
 <p>When looking for names for my search engine I was originally
 thinking about using the name SeekQuarry which hadn't been
-registered. After deciding that I would use Yioop! for the name
+registered. After deciding that I would use Yioop for the name
 of my search engine site, I decided I would use SeekQuarry as a
 site to publish the software that is used in the Yioop engine.
-That is, yioop.com is a live site that demonstrates the open
+That is, yioop.com is a live site that demonstrates the open
 source search engine software distributed on the seekquarry.com
 site.</p>
-<p>The name Yioop! has the following history:
+<p>The name Yioop has the following history:
 I was looking for names that hadn't already been registered. My
 wife is Vietnamese, so I thought I might have better luck with
 Vietnamese words since all the English ones seemed to have been taken.
@@ -23,61 +23,64 @@ I started with the word giup, which is the way to spell 'help'
 in Vietnamese if you remove the accents. It was already taken.
 Then I tried yoop, which is my lame way of pronouncing how
 giup sounds like in English. It was already taken. So then I
-combined the two to get Yioop!</p>
+combined the two to get Yioop.</p>

 <h1>Dictionary Data</h1>
 <p>
-<a href="http://en.wikipedia.org/wiki/Bloom_Filter">Bloom filters</a> for
-n grams on the Yioop! test site were generated using
+<a href="http://en.wikipedia.org/wiki/Bloom_Filter">Bloom filters</a> for
+n grams on the Yioop test site were generated using
 <a href="http://dumps.wikimedia.org/other/pagecounts-raw/">Wikimedia
-Page View Statistics</a>.
-<a href="http://en.wikipedia.org/wiki/Trie">Trie</a>'s for word suggestion
+Page View Statistics</a>.
+<a href="http://en.wikipedia.org/wiki/Trie">Trie</a>'s for word suggestion
 for all languages other than Vietnamese were built
 using the <a href="http://en.wiktionary.org/wiki/Wiktionary:Frequency_lists"
->Wiktionary Frequency List</a>. These are available under a
-<a href="http://creativecommons.org/licenses/by-sa/3.0/">Creative
-Commons Share Alike 3.0 Unported License</a> as described on <a
+>Wiktionary Frequency List</a>. These are available under a
+<a href="http://creativecommons.org/licenses/by-sa/3.0/">Creative
+Commons Share Alike 3.0 Unported License</a> as described on <a
 href="http://en.wikipedia.org/wiki/Wikipedia:Database_download">Wikipedia's
 Download page</a>. The derived data files (if they were created for that
-language) for a language IANA tag, locale-tag, can be found in the
-locale/locale-tag/resources folder of the Yioop! project. These
+language) for a language IANA tag, locale-tag, can be found in the
+locale/locale-tag/resources folder of the Yioop project. These
 are also licensed using the same license. For Vietnamese,
-I used the following <a
+I used the following <a
 href="http://www.informatik.uni-leipzig.de/~duc/software/misc/wordlist.html">
-Vietnamese Word List</a> obtained with permision from <a
+Vietnamese Word List</a> obtained with permision from <a
 href="http://www.informatik.uni-leipzig.de/~duc/">Ho Ngoc Duc</a>.
 </p>

 <h1>Additional Credits</h1>
 <p>
 Several people helped
-with localization: Mary Pollett,
-Jonathan Ben-David, Andrea Brunetti,
+with localization: Mary Pollett,
+Jonathan Ben-David, Ismail.B, Andrea Brunetti,
 Thanh Bui, Sujata Dongre, Animesh Dutta,
- Youn Kim, Akshat Kukreti, Vijeth Patil, Chao-Hsin Shih,
-and Sugi Widjaja. Thanks to Ravi Dhillon, Akshat Kukreti, Tanmayee Potluri,
-Shawn Tice, and Sandhya Vissapragada for
-creating patches for Yioop! issues. Several of my master's students have done
-projects related to Yioop!: Amith Chandranna, Priya Gangaraju, Ismail.B,
-Vijaya Pamidi, Vijeth Patil, and Vijaya Sinha. Amith's code related to an
-Online version of the HITs algorithm is not currently in the main branch of
-Yioop!, but it is obtainable from
-<a href="http://www.cs.sjsu.edu/faculty/pollett/masters/<?php
-?>Semesters/Spring10/amith/index.shtml">Amith Chandranna's student page</a>.
-Vijaya Pamidi developed a Firefox web traffic extension for Yioop!
-Her code is also obtainable from <a href="http://www.cs.sjsu.edu/faculty/<?php
-?>pollett/masters/Semesters/Fall10/vijaya/index.shtml">Vijaya Pamidi's
-master's pages</a>. <a href="http://www.cs.sjsu.edu/faculty/pollett/<?php
-?>masters/Semesters/Fall11/vijeth/index.shtml">Vijeth Patil's Project</a>
+ Youn Kim, Akshat Kukreti, Vijeth Patil, Chao-Hsin Shih,
+Ahmed Kamel Taha, and Sugi Widjaja. Thanks to Ravi Dhillon, Akshat Kukreti,
+Tanmayee Potluri, Shawn Tice, and Sandhya Vissapragada for
+creating patches for Yioop issues. Several of my master's students have done
+projects related to Yioop: Amith Chandranna, Priya Gangaraju,
+Vijaya Pamidi, Vijeth Patil, and Vijaya Sinha. Amith's code related to an
+Online version of the HITs algorithm is not currently in the main branch of
+Yioop, but it is obtainable from
+<a href="http://www.cs.sjsu.edu/faculty/pollett/masters/
+Semesters/Spring10/amith/index.shtml">Amith Chandranna's student page</a>.
+Vijaya Pamidi developed a Firefox web traffic extension for Yioop.
+Her code is also obtainable from <a href="http://www.cs.sjsu.edu/faculty/
+pollett/masters/Semesters/Fall10/vijaya/index.shtml">Vijaya Pamidi's
+master's pages</a>. <a href="http://www.cs.sjsu.edu/faculty/pollett/
+masters/Semesters/Fall11/vijeth/index.shtml">Vijeth Patil's Project</a>
 involved adding support for Twitter and RSS feeds to add additional real-time
-search results to the standard search results. This is not currently in main
-branch. <a href="http://www.cs.sjsu.edu/faculty/pollett/<?php
-?>masters/Semesters/Spring11/amith/index.shtml">Vijaya Sinha's Project</a>
-concerned using Open Street Map data in Yioop!. This code is not currently
+search results to the standard search results. This is not currently in main
+branch. <a href="http://www.cs.sjsu.edu/faculty/pollett/
+masters/Semesters/Spring11/amith/index.shtml">Vijaya Sinha's Project</a>
+concerned using Open Street Map data in Yioop. This code is not currently
 in the main branch. Priya's code served as the
-basis for the plugin feature currently in Yioop! Shawn Tice's CS288
-project served as the basis of a rewrite of the archive crawl feature of Yioop!
-for the multi-queue server setting. The following other
-students have  created text processors for Yioop!: Nakul Natu (pptx),
-Vijeth Patil (epub), and Tarun Pepira (xslx).
+basis for the plugin feature currently in Yioop. Shawn Tice's CS288
+project served as the basis of a rewrite of the archive crawl feature of Yioop
+for the multi-queue server setting. Sandhya Vissapragada's Master project served
+as the basis for the autosuggest and spell checking functionality in Yioop.
+The following other students have created text processors for Yioop: Nakul
+Natu (pptx), Vijeth Patil (epub), and Tarun Pepira (xslx). Akshat Kukreti
+created the Italian language stemmer based on the Snowball version at
+<a href="http://tartarus.org">http://tartarus.org</a>.
 </p>
diff --git a/en-US/pages/documentation.thtml b/en-US/pages/documentation.thtml
index 0bda91d..e334544 100755
--- a/en-US/pages/documentation.thtml
+++ b/en-US/pages/documentation.thtml
@@ -1,5 +1,5 @@
 <div class="docs">
-<h1>Yioop! Documentation v 0.90</h1>
+<h1>Yioop Documentation v 0.92</h1>
     <h2 id='toc'>Table of Contents</h2>
     <ul>
         <li><a href="#quick">Preface: Quick Start Guides</a></li>
@@ -7,8 +7,8 @@
         <li><a href="#requirements">Requirements</a></li>
         <li><a href="#installation">Installation and Configuration</a></li>
         <li><a href="#files">Summary of Files and Folders</a></li>
-        <li><a href="#interface">The Yioop! Search and User Interface</a></li>
-        <li><a href="#mobile">Yioop! Mobile Interface</a></li>
+        <li><a href="#interface">Yioop Search and User Interface</a></li>
+        <li><a href="#mobile">Yioop Mobile Interface</a></li>
         <li><a href="#passwords">Managing Accounts</a></li>
         <li><a href="#userroles">Managing Users and Roles</a></li>
         <li><a href="#crawls">Managing Crawls</a></li>
@@ -17,338 +17,344 @@
         <li><a href="#editor">Results Editor</a></li>
         <li><a href="#sources">Search Sources</a></li>
         <li><a href="#machines">GUI for Managing Machines and Servers</a></li>
-        <li><a href="#localizing">Localizing Yioop! to a New Language</a></li>
-        <li><a href="#framework">Building a Site using Yioop! as Framework</a>
+        <li><a href="#localizing">Localizing Yioop to a New Language</a></li>
+        <li><a href="#framework">Building a Site using Yioop as Framework</a>
         </li>
-        <li><a href="#embedding">Embedding Yioop! in an Existing Site</a></li>
-        <li><a href="#customizing">Customizing Yioop!</a></li>
-        <li><a href="#commandline">Yioop! Command-line Tools</a></li>
+        <li><a href="#embedding">Embedding Yioop in an Existing Site</a></li>
+        <li><a href="#customizing">Customizing Yioop</a></li>
+        <li><a href="#commandline">Yioop Command-line Tools</a></li>
         <li><a href="#references">References</a></li>
     </ul>
     <h2 id="quick">Preface: Quick Start Guides</h2>
     <p>This document serves as a detailed description of the
-    Yioop search engine. If you want to get started using Yioop! now,
+    Yioop search engine. If you want to get started using Yioop now,
     but perhaps in less detail, you might want to first read the
     <a href="?c=main&p=install">Installation
     Guides</a> page.
     </p>
     <h2 id="intro">Introduction</h2>
-    <p>The Yioop! search engine is designed to allow users
+    <p>The Yioop search engine is designed to allow users
     to produce indexes of a web-site or a collection of
     web-sites. The number of pages a Yioop index can handle range from small
     site to those containing tens or hundreds of millions of pages. In contrast,
-    a search-engine like Google maintains an index
-    of tens of billions of pages. Nevertheless, since you, the user, have
-    control over the exact sites which are being indexed with Yioop!, you have
-    much better control over the kinds of results that a search will return.
-    Yioop! provides a traditional web interface to do queries, an rss api,
-    and a function api. In this section we discuss some of the different
-    search engine technologies which exist today, how Yioop! fits into this
-    eco-system, and when Yioop! might be the right choice for your search
-    engine needs. In the remainder of this document after the introduction,
-    we discuss how to get and install Yioop!; the files and folders used
-    in Yioop!; user, role, search, subsearch, crawl,
-     and machine management in the Yioop! system;
-    localization in the Yioop! system; building a site using the Yioop!
-    framework; embedding Yioop! in an existing web-site;
-    customizing Yioop!; and the Yioop! command-line tools.
+    a search-engine like Google maintains an index
+    of tens of billions of pages. Nevertheless, since you, the user, have
+    control over the exact sites which are being indexed with Yioop, you have
+    much better control over the kinds of results that a search will return.
+    Yioop provides a traditional web interface to do queries, an rss api,
+    and a function api. In this section we discuss some of the different
+    search engine technologies which exist today, how Yioop fits into this
+    eco-system, and when Yioop might be the right choice for your search
+    engine needs. In the remainder of this document after the introduction,
+    we discuss how to get and install Yioop; the files and folders used
+    in Yioop; user, role, search, subsearch, crawl,
+     and machine management in the Yioop system;
+    localization in the Yioop system; building a site using the Yioop
+    framework; embedding Yioop in an existing web-site;
+    customizing Yioop; and the Yioop command-line tools.
     </p>
     <p>Since the mid-1990s a wide variety of search engine technologies
     have been explored. Understanding some of this history is useful
-    in understanding Yioop! capabilities. In 1994, Web Crawler, one of the
-    earliest still widely-known search engines, only had an
-    index of about 50,000 pages which was stored in an Oracle database.
+    in understanding Yioop capabilities. In 1994, Web Crawler, one of the
+    earliest still widely-known search engines, only had an
+    index of about 50,000 pages which was stored in an Oracle database.
     Today, databases are still used to create indexes for small to medium size
-    sites. An example of such a search engine written in PHP is
+    sites. An example of such a search engine written in PHP is
     <a href="http://www.sphider.eu/">Sphider</a>. Given that a database is
     being used, one common way to associate a word with a document is to
     use a table with a columns like word id, document id, score. Even if
-    one is only extracting about a hundred unique words per page,
+    one is only extracting about a hundred unique words per page,
     this table's size would need to be in the hundreds of millions for even
     a million page index. This edges towards the limits of the capabilities
-    of database systems although techniques like table sharding can help to
-    some degree. The Yioop! engine uses a database to manage some things
+    of database systems although techniques like table sharding can help to
+    some degree. The Yioop engine uses a database to manage some things
     like users and roles, but uses its own web archive format and indexing
     technologies to handle crawl data. This is one of the reasons that
-    Yioop! can scale to larger indexes.</p>
+    Yioop can scale to larger indexes.</p>
     <p>When a site that is being indexed consists of dynamic pages rather than
     the largely static page situation considered above, and those dynamic
     pages get most of their text content from a table column or columns,
     different search index approaches are often used. Many database management
-    systems like <a href="http://www.mysql.com">MySQL</a>, support the ability
+    systems like <a href="http://www.mysql.com">MySQL</a>/<a
+    href="https://mariadb.org/">MariaDB</a>, support the ability
     to create full text indexes for text columns. A faster more robust approach
-    is to use a stand-alone full text index server such as <a
-    href="http://www.sphinxsearch.com/">Sphinx</a>. However, for these
-    approaches to work the text you are indexing needs to be in a database
-    column or columns, or have an easy to define "XML mapping". Nevertheless,
+    is to use a stand-alone full text index server such as <a
+    href="http://www.sphinxsearch.com/">Sphinx</a>. However, for these
+    approaches to work the text you are indexing needs to be in a database
+    column or columns, or have an easy to define "XML mapping". Nevertheless,
     these approaches illustrate another
     common thread in the development of search systems: Search as an appliance,
     where you either have a separate search server and access it through either
-    a web-based API or through function calls. Yioop! has both a search
-    function API as well as a web API that returns
+    a web-based API or through function calls. Yioop has both a search
+    function API as well as a web API that returns
     <a href="http://www.opensearch.org">Open Search RSS results</a>. These
-    can be used to embed Yioop! within your existing site. If you want to
-    create a new search engine site, Yioop! offers a web-based,
+    can be used to embed Yioop within your existing site. If you want to
+    create a new search engine site, Yioop offers a web-based,
     model-view-controller framework with a web-interface for localization
     that can serve as the basis for your app.
     </p>
     <p>
-    By 1997 commercial sites like Inktomi and AltaVista already had
+    By 1997 commercial sites like Inktomi and AltaVista already had
     tens or hundreds of millions of pages in their
-    indexes [<a href="#P1994">P1994</a>] [<a href="#P1997a">P1997a</a>]
-    [<a href="#P1997b">P1997b</a>]. Google [<a href="#BP1998">BP1998</a>]
+    indexes [<a href="#P1994">P1994</a>] [<a href="#P1997a">P1997a</a>]
+    [<a href="#P1997b">P1997b</a>]. Google [<a href="#BP1998">BP1998</a>]
     circa 1998 in comparison had an index of about 25 million pages.
     These systems used many machines each working on parts of the search
     engine problem. On each machine there would, in addition, be several
     search related processes, and for crawling, hundreds of simultaneous
     threads would be active to manage open connections to remote machines.
-    Without threading downloading millions of pages would be very slow.
-    Yioop! is written in <a href="http://www.php.net/">PHP</a>. This
-    language is the `P' in the very popular
-    <a href="http://en.wikipedia.org/wiki/LAMP_%28software_bundle%29">LAMP</a>
+    Without threading downloading millions of pages would be very slow.
+    Yioop is written in <a href="http://www.php.net/">PHP</a>. This
+    language is the `P' in the very popular
+    <a href="http://en.wikipedia.org/wiki/LAMP_%28software_bundle%29">LAMP</a>
     web platform. This is one of the reasons PHP was chosen as the language
-    of Yioop! Unfortunately, PHP does not have built-in threads. However,
-    the PHP language does have a multi-curl library (implemented in C) which
-    uses threading to support many simultaneous page downloads. This is what
-    Yioop! uses. Like these early systems Yioop! also supports the ability to
+    of Yioop. Unfortunately, PHP does not have built-in threads. However,
+    the PHP language does have a multi-curl library (implemented in C) which
+    uses threading to support many simultaneous page downloads. This is what
+    Yioop uses. Like these early systems Yioop also supports the ability to
     distribute the task of downloading web pages to several machines.
     As the problem of managing many machines becomes more difficult as
-    the number of machines grows, Yioop! further has a web interface for
+    the number of machines grows, Yioop further has a web interface for
     turning on and off the processes related to crawling on remote machines
-    managed by Yioop!</p>
+    managed by Yioop</p>
     <p>There are several aspects of a search engine besides
     downloading web pages that benefit from
     a distributed computational model. One of the reasons Google was able
     to produce high quality results was that it was able to accurately
     rank the importance of web pages. The computation of this page rank
-    involves repeatedly applying Google's normalized variant of the
+    involves repeatedly applying Google's normalized variant of the
     web adjacency matrix to an initial guess of the page ranks. This problem
-    naturally decomposes into rounds. Within a round the Google matrix is
-    applied to the current page ranks estimates of a set of sites. This
-    operation is reasonably easy to distribute to many machines. Computing how
+    naturally decomposes into rounds. Within a round the Google matrix is
+    applied to the current page ranks estimates of a set of sites. This
+    operation is reasonably easy to distribute to many machines. Computing how
     relevant a word is to a document is another
-    task that benefits from multi-round, distributed computation. When a document
-    is processed by indexers on multiple machines, words are extracted and a
-    stemming algorithm such as [<a href="#P1980">P1980</a>] or a character
-    n-gramming technique might be employed (a stemmer would extract the word
-    jump from words such as jumps, jumping, etc; converting jumping to 3-grams
-    would make terms of length 3, i.e., jum, ump, mpi, pin, ing). Next a
-    statistic such as BM25F [<a href="#ZCTSR2004">ZCTSR2004</a>]
-    (or at least the non-query time part of it) is computed to determine the
-    importance of that word in that document compared to that word amongst
-    all other documents. To do this calculation
-    one needs to compute global statistics concerning all documents seen,
-    such as their average-length, how often a term appears in a document, etc.
-    If the crawling is distributed it might take one or more merge rounds to
-    compute these statistics based on partial computations on many machines.
-    Hence, each of these computations benefit from allowing distributed
-    computation to be multi-round. Infrastructure such as the Google
-    File System [<a href="#GGL2003">GGL2003</a>], the MapReduce model [<a
+    task that benefits from multi-round, distributed computation. When a
+    document is processed by indexers on multiple machines, words are extracted
+    and a stemming algorithm such as [<a href="#P1980">P1980</a>] or a character
+    n-gramming technique might be employed (a stemmer would extract the word
+    jump from words such as jumps, jumping, etc; converting jumping to 3-grams
+    would make terms of length 3, i.e., jum, ump, mpi, pin, ing). Next a
+    statistic such as BM25F [<a href="#ZCTSR2004">ZCTSR2004</a>]
+    (or at least the non-query time part of it) is computed to determine the
+    importance of that word in that document compared to that word amongst
+    all other documents. To do this calculation
+    one needs to compute global statistics concerning all documents seen,
+    such as their average-length, how often a term appears in a document, etc.
+    If the crawling is distributed it might take one or more merge rounds to
+    compute these statistics based on partial computations on many machines.
+    Hence, each of these computations benefit from allowing distributed
+    computation to be multi-round. Infrastructure such as the Google
+    File System [<a href="#GGL2003">GGL2003</a>], the MapReduce model [<a
     href="#DG2004">DG2004</a>],
-    and the Sawzall language [<a href="#PDGQ2006">PDGQ2006</a>] were built to
+    and the Sawzall language [<a href="#PDGQ2006">PDGQ2006</a>] were built to
     make these multi-round
     distributed computation tasks easier. In the open source community,
     the <a href="http://hadoop.apache.org/hdfs/"
-    >Hadoop Distributed File System</a>,
-    <a href="http://hadoop.apache.org/mapreduce">Hadoop MapReduce</a>,
-    and <a href="http://hadoop.apache.org/pig/">Pig</a> play an analogous role
+    >Hadoop Distributed File System</a>,
+    <a href="http://hadoop.apache.org/mapreduce">Hadoop MapReduce</a>,
+    and <a href="http://hadoop.apache.org/pig/">Pig</a> play an analogous role
     [<a href="#W2009">W2009</a>]. Recently, a theoretical framework
     for what algorithms can be carried out as rounds of map inputs to
     sequence of key value pairs, shuffle pairs with same keys to the same
-    nodes, reduce key-value pairs at each node by some computation
-    has begun to be developed [<a
+    nodes, reduce key-value pairs at each node by some computation
+    has begun to be developed [<a
     href="#KSV2010">KSV2010</a>]. This framework shows the map reduce model
     is capable of solving quite general cloud computing problems -- more
-    than is needed just to deploy a search engine.
+    than is needed just to deploy a search engine.
     </p>
     <p>Infrastructure such as this is not trivial for a small-scale business
     or individual to deploy. On the other hand, most small businesses and
     homes do have available several machines not all of whose computational
     abilities are being fully exploited. So the capability to do
     distributed crawling and indexing in this setting exists. Further
-    high-speed internet for homes and small businesses is steadily
+    high-speed internet for homes and small businesses is steadily
     getting better. Since the original Google paper, techniques
     to rank pages have been simplified [<a href="#APC2003">APC2003</a>].
     It is also possible to approximate some of the global statistics
-    needed in BM25F using suitably large samples.</p>
-    <p>Yioop! tries to exploit
+    needed in BM25F using suitably large samples.</p>
+    <p>Yioop tries to exploit
     these advances to use a simplified distributed model which might
-    be easier to deploy in a smaller setting. Each node in a Yioop! system
-    is assumed to have a web server running. One of the Yioop! nodes
-    web app's is configured to act as a
+    be easier to deploy in a smaller setting. Each node in a Yioop system
+    is assumed to have a web server running. One of the Yioop nodes
+    web app's is configured to act as a
     coordinator for crawls. It is called the <b>name server</b>. In addition
-    to the name server, one might have several processes called
-    <b>queue servers</b> that perform scheduling and indexing jobs, as well as
-    <b>fetcher</b> processes which are responsible for downloading pages.
-    Through the name server's web app, users can send messages to the
+    to the name server, one might have several processes called
+    <b>queue servers</b> that perform scheduling and indexing jobs, as well as
+    <b>fetcher</b> processes which are responsible for downloading pages.
+    Through the name server's web app, users can send messages to the
     queue servers and fetchers. This interface writes message
     files that queue servers periodically looks for. Fetcher processes
     periodically ping the name server to find the name of the current crawl
-    as well as a list of queue servers. Fetcher programs then periodically
-    make requests in a round-robin fashion to the queue servers for messages
-    and schedules. A schedule is data to process and a message has control
-    information about what kind of processing should be done. A given
-    queue_server is responsible for generating schedule files for data with a
+    as well as a list of queue servers. Fetcher programs then periodically
+    make requests in a round-robin fashion to the queue servers for messages
+    and schedules. A schedule is data to process and a message has control
+    information about what kind of processing should be done. A given
+    queue_server is responsible for generating schedule files for data with a
     certain hash value, for example, to crawl urls for urls with host names
-    that hash to queue server's id.  As a fetcher processes a schedule, it
+    that hash to queue server's id.  As a fetcher processes a schedule, it
     periodically POSTs the result of its computation back to the responsible
     queue server's web server. The data is then written to a set of received
     files. The queue_server as part of its loop looks for received files
     and merges their results into the index so far. So the model is in a
-    sense one round: URLs are sent to the fetchers, summaries of downloaded
+    sense one round: URLs are sent to the fetchers, summaries of downloaded
     pages are sent back to the queue servers and merged into their indexes.
     As soon as the crawl is over one can do text search on the crawl.
-    Deploying this  computation model is relatively simple: The web server
-    software needs to be installed on each machine, the Yioop! software (which
+    Deploying this  computation model is relatively simple: The web server
+    software needs to be installed on each machine, the Yioop software (which
     has the the fetcher, queue_server, and web app components) is copied to
     the desired location under the web server's document folder, each instance
-    of Yioop! is configured to know who the name server is, and finally,
+    of Yioop is configured to know who the name server is, and finally,
     the fetcher programs and queue server programs are started.
     </p>
     <p>As an example
-    of how this scales, 2010 Mac Mini running a queue server
+    of how this scales, 2010 Mac Mini running a queue server
     program can schedule and index about 100,000 pages/hour. This corresponds
     to the work of about 10 fetcher processes (which can be on the same
     machine, if you have enough memory, or different ones). The checks by
     fetchers on the name server are lightweight, so adding another machine with
-    a queue server and the corresponding additional fetchers allows one to
-    effectively double this speed. This also has the benefit of speeding up
-    query processing as when a query comes in, it gets split into queries for
-    each of the queue server's web apps, but the query only "looks" slightly
-    more than half as far into the posting list as would occur in a single
-    queue server setting. To further increase query throughput,
-    the number queries that can be handled at a given time, Yioop! installations
+    a queue server and the corresponding additional fetchers allows one to
+    effectively double this speed. This also has the benefit of speeding up
+    query processing as when a query comes in, it gets split into queries for
+    each of the queue server's web apps, but the query only "looks" slightly
+    more than half as far into the posting list as would occur in a single
+    queue server setting. To further increase query throughput,
+    the number queries that can be handled at a given time, Yioop installations
     can also be configured as "mirrors" which keep an exact copy of the
     data stored in the site being mirrored. When a query request comes into a
-    Yioop! node, either it or any of its mirrors might handle it.
+    Yioop node, either it or any of its mirrors might handle it.
     </p>
     <p>Since a  multi-million page crawl involves both downloading from the
-    web rapidly over several days, Yioop! supports the ability to dynamically
-    change its crawl parameters as a crawl is going on.  This allows a user on
-    request from a web admin to disallow Yioop! from continuing to crawl a site
-    or to restrict the number of urls/hours crawled from a site without
-    having to stop the overall crawl. One can also through a web
+    web rapidly over several days, Yioop supports the ability to dynamically
+    change its crawl parameters as a crawl is going on.  This allows a user on
+    request from a web admin to disallow Yioop from continuing to crawl a site
+    or to restrict the number of urls/hours crawled from a site without
+    having to stop the overall crawl. One can also through a web
     interface inject new seed sites, if you want, while the crawl is occurring.
     This can help if someone suggests to you a site that might otherwise not
-    be found by Yioop! given its original list of seed sites. Crawling
-    at high-speed can cause a website to become congested and
-    unresponsive. As of Version 0.84, if Yioop! detects a site is
+    be found by Yioop given its original list of seed sites. Crawling
+    at high-speed can cause a website to become congested and
+    unresponsive. As of Version 0.84, if Yioop detects a site is
     becoming congested it can automatically slow down the crawling of that site.
     Finally, crawling at high-speed can cause your domain name
     server (the server that maps www.yioop.com to 173.13.143.74) to become slow.
     To reduce the effect of this Yioop supports domain name caching.
     </p>
-    <p>Despite its simpler one-round model, Yioop! does a number of things to
-    improve the quality of its search results. For each link extracted from a
-    page, Yioop! creates a micropage which it adds to its index. This includes
-    relevancy calculations for each word in the link as well as an
-    [<a href="#APC2003">APC2003</a>]-based ranking of how important the
-    link was. Yioop! supports a number of iterators which can be thought of
-    as implementing a stripped-down relational algebra geared towards
-    word-document indexes (this is much the same idea as Pig). One of these
+    <p>Despite its simpler one-round model, Yioop does a number of things to
+    improve the quality of its search results. For each link extracted from a
+    page, Yioop creates a micropage which it adds to its index. This includes
+    relevancy calculations for each word in the link as well as an
+    [<a href="#APC2003">APC2003</a>]-based ranking of how important the
+    link was. Yioop supports a number of iterators which can be thought of
+    as implementing a stripped-down relational algebra geared towards
+    word-document indexes (this is much the same idea as Pig). One of these
     operators allows one to make results from unions of stored crawls. This
     allows one to do many smaller topic specific crawls and combine them with
-    your own weighting scheme into a larger crawl. A second useful operator
-    allows you to display a certain number of results from a given subquery,
-    then go on to display results from other subqueries. This allows you to
+    your own weighting scheme into a larger crawl. A second useful operator
+    allows you to display a certain number of results from a given subquery,
+    then go on to display results from other subqueries. This allows you to
     make a crawl presentation like: the first result
-    should come from the open crawl results, the second result from
+    should come from the open crawl results, the second result from
     Wikipedia results, the next result should be an image, and any remaining
-    results should come from the open search results.  This approach is not
-    unlike topic-sensitive page ranking approaches [<a href="#H2002">H2002</a>].
-    Yioop! comes with a GUI facility to make the creation of these crawl mixes
-    easy. Another useful operator Yioop! supports allows one to perform
-    groupings  of document results. In the search results displayed,
-    grouping by url allows all links and documents associated with a url to be
-    grouped as one object. Scoring of this group is a sum of all these scores.
-    Thus, link text is used in the score of a document. How much weight a word
-    from a link gets also depends on the link's rank. So a low-ranked link with
-    the word "stupid" to a given site would tend not to show up early in the
-    results for the word "stupid". Grouping also is used to handle
+    results should come from the open search results.
+    Yioop comes with a GUI facility to make the creation of these crawl mixes
+    easy. To speed up query processing for these crawl mixes one can also create
+    materialized versions of crawl mix results, which makes a separate index
+    of crawl mix results. Another useful operator Yioop supports allows one to
+    perform groupings  of document results. In the search results displayed,
+    grouping by url allows all links and documents associated with a url to be
+    grouped as one object. Scoring of this group is a sum of all these scores.
+    Thus, link text is used in the score of a document. How much weight a word
+    from a link gets also depends on the link's rank. So a low-ranked link with
+    the word "stupid" to a given site would tend not to show up early in the
+    results for the word "stupid". Grouping also is used to handle
     deduplication: It might be the case that the pages of many different URLs
-    have essentially the same content. Yioop! creates a hash of the web page
+    have essentially the same content. Yioop creates a hash of the web page
     content of each downloaded url. Amongst urls with the same hash only the
     one that is linked to the most will be returned after grouping. Finally,
     if a user wants to do more sophisticated post-processing such as clustering
-    or computing page, Yioop! supports a straightforward architecture
+    or computing page, Yioop supports a straightforward architecture
     for indexing plugins.
     </p>
     <p>
     There are several open source crawlers which can scale to crawls in the
-    millions to hundred of millions of pages. Most of these are written in
-    Java, C, C++, C#, not PHP. Three important ones are <a
+    millions to hundred of millions of pages. Most of these are written in
+    Java, C, C++, C#, not PHP. Three important ones are <a
     href="http://nutch.apache.org/">Nutch</a>/
-    <a href="http://lucene.apache.org/">Lucene</a>/ <a
+    <a href="http://lucene.apache.org/">Lucene</a>/ <a
     href="http://lucene.apache.org/solr/">Solr</a>
     [<a href="KC2004">KC2004</a>], <a href="http://www.yacy.net/">YaCy</a>,
-    and <a href="http://crawler.archive.org/">Heritrix</a>
+    and <a href="http://crawler.archive.org/">Heritrix</a>
     [<a href="#MKSR2004">MKSR2004</a>]. Nutch is the original application for
     which the Hadoop infrastructure described above was developed. Nutch
     is a crawler, Lucene is for indexing, and Solr is a search engine front end.
-    The YaCy project uses an interesting distributed hash table
-    peer-to-peer approach to crawling, indexing, and search. Heritrix is
-    a web crawler developed at the <a
+    The YaCy project uses an interesting distributed hash table
+    peer-to-peer approach to crawling, indexing, and search. Heritrix is
+    a web crawler developed at the <a
     href="http://www.archive.org/">Internet Archive</a>. It was designed to do
     archival quality crawls of the web. Its ARC file format
-    inspired the use of WebArchive objects in Yioop!. WebArchives are Yioop!'s
+    inspired the use of WebArchive objects in Yioop. WebArchives are Yioop's
     container file format for storing web pages, web summary data, url lists,
-    and other kinds of data used by Yioop!. A WebArchive is essentially a
-    linked-list of compressed, serialized PHP objects, the last element in this
-    list containing a header object with information like version number and a
+    and other kinds of data used by Yioop. A WebArchive is essentially a
+    linked-list of compressed, serialized PHP objects, the last element in this
+    list containing a header object with information like version number and a
     total count of objects stored. The compression format can be chosen to
     suit the kind of objects being stored. The header can be used to store
     auxiliary data structures into the list if desired. One nice aspect of
     serialized PHP objects versus serialized Java Objects is that they are
     humanly readable text strings. The main purpose of
-    Web Archives is to allow one to store
-    many small files compressed as one big file. They also make data from web
+    Web Archives is to allow one to store
+    many small files compressed as one big file. They also make data from web
     crawls very portable, making them easy to copy from one location to another.
-    Like Nutch and Heritrix, Yioop! also has a command-line tool for quickly
+    Like Nutch and Heritrix, Yioop also has a command-line tool for quickly
     looking at the contents of such archive objects.
     </p>
     <p>The <a href="http://www.archive.org/web/researcher/ArcFileFormat.php">ARC
     format</a> is one example of an archival file format for web data. Besides
-    at the Internet Archive, ARC and its successor
+    at the Internet Archive, ARC and its successor
     <a href="
-    http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml">WARC
-    format</a> are often used by TREC conferences to store test data sets such
+    http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml">WARC
+    format</a> are often used by TREC conferences to store test data sets such
     as <a href="http://ir.dcs.gla.ac.uk/test_collections/">GOV2</a> and the
     <a href="http://boston.lti.cs.cmu.edu/Data/clueweb09/">ClueWeb Dataset</a>.
-    In addition, it was used by grub.org (hopefully, only on a
+    In addition, it was used by grub.org (hopefully, only on a
     temporary hiatus), a distributed, open-source, search engine project in C#.
     Another important format for archiving web pages is the XML format used by
-    <a href="http://www.wikipedia.org/">Wikipedia</a> for archiving MediaWiki
-    wikis. Wikipedia offers <a
-    href="http://en.wikipedia.org/wiki/Wikipedia:Database_download">creative
-    common-licensed downloads</a>
-    of their site in this format. The <a href="http://www.dmoz.org/">Open
-    Directory Project</a> makes available its <a
+    <a href="http://www.wikipedia.org/">Wikipedia</a> for archiving MediaWiki
+    wikis. Wikipedia offers <a
+    href="http://en.wikipedia.org/wiki/Wikipedia:Database_download">creative
+    common-licensed downloads</a>
+    of their site in this format. The <a href="http://www.dmoz.org/">Open
+    Directory Project</a> makes available its <a
     href="http://www.dmoz.org/rdf.html">ODP data set</a> in an RDF-like format
     licensed using the Open Directory License. Thus, we see that there are many
     large scale useful data sets that can be easily licensed. Raw data dumps
     do not contain indexes of the data though. This makes sense because indexing
     technology is constantly improving and it is always possible to re-index
-    old data. Yioop! supports importing and indexing data from ARC,
-    MediaWiki XML dumps, and Open Directory RDF, it also supports re-indexing of
-    old Yioop! data files created after version 0.66. This means using Yioop!
-    you can have searchable access to many data sets as well as have the
-    ability to maintain your data going forward.
+    old data. Yioop supports importing and indexing data from ARC, database
+    queries, log files MediaWiki XML dumps, and Open Directory RDF. It also
+    supports re-indexing of old Yioop data files created after version 0.66,
+    and indexing crawl mixes. This means using Yioop
+    you can have searchable access to many data sets as well as have the
+    ability to maintain your data going forward. When displaying caches of
+    web pages in Yioop, the interface further supports the ability to display
+    a history of all cached copies of that page, in a similar fashion to
+    Internet Archives interface.
     </p>
     <p>Another important aspect of creating a modern search engine is
     the ability to display in an appropriate way various media sources.
     Yioop comes with built-in susearch abilities for images, where
     results are displayed as image strips; video, where thumbnails for
     video are shown; and news, where news items are grouped together and
-    a configurable set of news/twitter feeds can be set to be updated on an
+    a configurable set of news/twitter feeds can be set to be updated on an
     hourly basis.</p>
     <p>
-    This concludes the discussion of how Yioop! fits into the current and
-    historical landscape of search engines and indexes. Here is short summary
-    features of Yioop! that should make sense after and be taken away from
+    This concludes the discussion of how Yioop fits into the current and
+    historical landscape of search engines and indexes. Here is short summary
+    features of Yioop that should make sense after and be taken away from
     this introduction:
     </p>
     <ul>
-    <li>Yioop! is an open-source, distributed crawler and search engine
+    <li>Yioop is an open-source, distributed crawler and search engine
     written in PHP.</li>
     <li>It is capable of crawling and indexing small sites to sites or
     collections of sites containing ten million or low hundred of millions
@@ -358,13 +364,13 @@
     <li>It has a web interface to select seed sites for crawls and to set what
     sites crawls should not be crawled.</li>
     <li>It obeys robots.txt file including Google and Bing extensions such
-    as the Crawl-delay and Sitemap directives as well as * and $ in allow and
+    as the Crawl-delay and Sitemap directives as well as * and $ in allow and
     disallow. It further supports robots meta tag NONE, NOINDEX, NOFOLLOW,
     NOARCHIVE, and NOSNIPPET and anchor tags with rel="nofollow"
     attributes. It also supports X-Robots-Tag HTTP headers.</li>
-    <li>Yioop! supports crawl quotas for web sites. i.e., one can control
+    <li>Yioop supports crawl quotas for web sites. i.e., one can control
     the number of urls/hour downloaded from a site.</li>
-    <li>Yioop! can detect website congestion and slow down crawling
+    <li>Yioop can detect website congestion and slow down crawling
     a site that it detects as congested.</li>
     <li>It supports open web crawls, but through its web interface one can
     configure it also to crawl only specifics site, domains, or collections
@@ -372,11 +378,11 @@
     directives to crawl a site to a fixed depth.</li>
     <li>It supports dynamically changing the allowed and disallowed
     sites while a crawl is in progress.</li>
-    <li>It supports dynamically injecting new seeds site via a web
+    <li>It supports dynamically injecting new seeds site via a web
     interface into the active crawl.</li>
     <li>It has its own DNS caching mechanism.</li>
-    <li>Yioop! supports the indexing of many different filetypes including:
-    HTML, BMP, DOC, ePub, GIF, JPG, PDF, PPT, PPTX, PNG, RSS, RTF, sitemaps,
+    <li>Yioop supports the indexing of many different filetypes including:
+    HTML, BMP, DOC, ePub, GIF, JPG, PDF, PPT, PPTX, PNG, RSS, RTF, sitemaps,
     SVG, XLSX, and XML. It has a web interface for controlling which amongst
     these filetypes (or all of them) you want to index.</li>
     <li>Yioop supports subsearches geared towards presenting certain
@@ -385,72 +391,71 @@
     hourly.</li>
     <li>Crawling, indexing, and serving search results can be done on a
     single machine or distributed across several machines.</li>
-    <li>It uses a simplified distributed model that is straightforward to
-    deploy.</li>
-    <li>The fetcher/queue_server processes on several machines can be
-    managed through the web interface of a main Yioop! instance.</li>
-    <li>Yioop! installations can be screated with a variety of topologies:
+    <li>The fetcher/queue_server processes on several machines can be
+    managed through the web interface of a main Yioop instance.</li>
+    <li>Yioop installations can be screated with a variety of topologies:
     one queue_server and many fetchers or several queue_servers and
     many fetchers.</li>
     <li>It determines search results using a number of iterators which
     can be combined like a simplified relational algebra.</li>
     <li>Yioop can be configured to display word suggestion as a user
-    types a query.</li>
+    types a query. It can also suggest spell corrections for mis-typed
+    queries</li>
     <li>Since version 0.70, Yioop indexes are positional rather than
-    bag of word indexes, and a index compression scheme called Modified9
+    bag of word indexes, and a index compression scheme called Modified9
     is used.</li>
-    <li>Yioop! supports a web interface which makes
+    <li>Yioop supports a web interface which makes
     it easy to combine results from several crawl indexes to create unique
     result presentations. These combinations can be done in a conditional
     manner using "if:" meta words.</li>
     <li>Indexing occurs as crawling happens, so when a crawl is stopped,
     it is ready to be used to handle search queries immediately.</li>
-    <li>Yioop! supports an indexing plugin architecture to make it
-    possible to write one's own indexing modules that do further
+    <li>Yioop supports an indexing plugin architecture to make it
+    possible to write one's own indexing modules that do further
     post-processing.</li>
-    <li>Yioop! has a web form that allows a user to control the recrawl
+    <li>Yioop has a web form that allows a user to control the recrawl
     frequency for a page during a crawl.</li>
-    <li>Yioop! has a web form that allows users to specify meta words
+    <li>Yioop has a web form that allows users to specify meta words
     to be injected into an index based on whether a downloaded document matches
     a url pattern.</li>
-    <li>Yioop! uses a web archive file format which makes it easy to
+    <li>Yioop uses a web archive file format which makes it easy to
     copy crawl results amongst different machines. It has a command-line
-    tool for inspecting these archives if they need to examined
+    tool for inspecting these archives if they need to examined
     in a non-web setting. It also supports command-line search querying
     of these archives.</li>
     <li>Using web archives, crawls can be mirrored amongst several machines
     to speed-up serving search results. This can be further sped-up
     by using memcache or filecache.</li>
-    <li>Yioop! supports the ability to filter out urls from search
+    <li>Yioop supports the ability to filter out urls from search
     results after a crawl has been performed. It also has the ability
     to edit summary information that will be displayed for urls.</li>
-    <li>A given Yioop! installation might have several saved crawls and
+    <li>A given Yioop installation might have several saved crawls and
     it is very quick to switch between any of them and immediately start
     doing text searches.</li>
-    <li>Yioop! supports importing data from ARC, MediaWiki XML, and ODP
+    <li>Yioop supports importing data from ARC, MediaWiki XML, and ODP
     RDF files, it also supports re-indexing of data from WebArchives created
     since version 0.66.</li>
-    <li>Yioop! comes with its own extendable model-view-controller
+    <li>Yioop comes with its own extendable model-view-controller
     framework that you can use directly to create new sites that use
-    Yioop! search technology. This framework also comes with a GUI
+    Yioop search technology. This framework also comes with a GUI
     which makes it easy to localize strings and static pages.</li>
     <li>Besides standard output of a web page with ten links it is possible
     to get query results in Open Search RSS format and also to query
-    Yioop! data via a function api.</li>
-    <li>Yioop! has been optimized to work well with smart phone web browsers
+    Yioop data via a function api.</li>
+    <li>Yioop has been optimized to work well with smart phone web browsers
     and with tablet devices.</li>
-    <li>Yioop! has built-in support for image and video specific search</li>
+    <li>Yioop has built-in support for image and video specific search</li>
     </ul>
     <p><a href="#toc">Return to table of contents</a>.</p>

     <h2 id="requirements">Requirements</h2>
-    <p>The Yioop! search engine requires: (1) a web server, (2) PHP 5.3 or
-    better (Yioop! used only to serve search results from a pre-built index
+    <p>The Yioop search engine requires: (1) a web server, (2) PHP 5.3 or
+    better (Yioop used only to serve search results from a pre-built index
     has been tested to work in PHP 5.2), (3) Curl libraries for downloading
-    web pages. To be a little more specific Yioop! has been tested with
+    web pages. To be a little more specific Yioop has been tested with
     Apache 2.2 and I've been told Version 0.82 or newer works with lighttpd.
     It should work with other webservers, although it might take some
-    finessing. For PHP,
+    finessing. For PHP,
     you need a build of PHP that incorporates multi-byte string (mb_ prefixed)
     functions, Curl, Sqlite (or at least PDO with Sqlite driver),
     the GD graphics library and the command-line interface. If you are using
@@ -475,11 +480,13 @@ post_max_size = 8M
 to
 post_max_size = 32M
 </pre>
+<p>Yioop will work with the post_max_size set to as little as two
+megabytes byte will be faster with the larger post capacity.</p>
 <p>If you are using WAMP, similar changes
 as with XAMPP must be made, but be aware that WAMP has two php.ini
 files and both of these must be changed.</p>
 <p>
-    If you are using the Ubuntu-variant of Linux, the following lines would
+    If you are using the Ubuntu-variant of Linux, the following lines would
     get the software you need:
     </p>
     <pre>
@@ -494,14 +501,14 @@ files and both of these must be changed.</p>
     <p>For both Mac and Linux, you need to alter the post_max_size
     variable in your php.ini file as in the Windows case above.</p>
     <p>In addition to the minimum installation requirements above, if
-    you want to use the Manage Machines feature in Yioop!, you might need
+    you want to use the Manage Machines feature in Yioop, you might need
     to do some additional configuration. The <a href="#machines"
     >Manage Machines</a> activity
-    allows you through a web interface to start/stop and look at the
-    log files for each of the queue_servers, and fetchers that you want
-    Yioop! to manage. If it is not configured then these task would need
+    allows you through a web interface to start/stop and look at the
+    log files for each of the queue_servers, and fetchers that you want
+    Yioop to manage. If it is not configured then these task would need
     to be done via the command line. <b>Also, if you do not use
-    the Manage Machine interface your Yioop! site can make use of only one
+    the Manage Machine interface your Yioop site can make use of only one
     queue_server.</b> On OSX and Linux, Manage Machines
     needs to be able to schedule "at" batch jobs (type man at to find out
     more about these). On OSX to enable
@@ -518,31 +525,37 @@ sudo launchctl load -w /System/Library/LaunchDaemons/com.apple.atrun.plist
     of these platforms you need to ensure that Apache is not running as
     nobody. Edit the $XAMPP/etc/httpd.conf file and set the User and Group
     to a real user.</p>
+    <p>Some versions of Linux like Centos, have the web-server user (apache
+    for Centos) configured with noshell as the shell and make use of
+    SELinux to provide mandatory access control. Both of these can prevetn
+    at jobs from being scheduled by the web server. You can use
+    the command <tt>usermod -s /bin/sh apache</tt> to set the shell and edit
+    the SELinux domain of the web server to fix these issues in this case.</p>
     <p>To get Manage Machines to work on a PC you need to first install
     PsTools from Microsoft.<br />
 <a href="http://technet.microsoft.com/en-us/sysinternals/bb896649">
 http://technet.microsoft.com/en-us/sysinternals/bb896649</a>.<br />
-    Depending on how your machine is configured this can be a security risk, so
+    Depending on how your machine is configured this can be a security risk, so
     do some research before deciding if you really want to do this. After
     installing PsTools you next need to edit your Environment Variables
     and add both the path to psexec and php to your PATH variable. You can
     find the place to set these vairables, by clicking on the Start Menu,
     then Control Panel, System and Security, Advanced Systems and Settings.</p>
-    <p>As a final step, after installing the necessary software,
-    <b>make sure to start/restart your web server and verify that
+    <p>As a final step, after installing the necessary software,
+    <b>make sure to start/restart your web server and verify that
     it is running.</b></p>
     <h3>Memory Requirements</h3>
-    <p>In addition, to the prerequisite software listed above, Yioop! also
-    has certain memory requirements. By default bin/queue_server.php
-    requires 1400MB, bin/fetcher.php requires 850MB, and index.php requires
-    500MB. These  values are set near the tops of each of these files in turn
+    <p>In addition, to the prerequisite software listed above, Yioop also
+    has certain memory requirements. By default bin/queue_server.php
+    requires 1400MB, bin/fetcher.php requires 850MB, and index.php requires
+    500MB. These  values are set near the tops of each of these files in turn
     with a line like:</p>
 <pre>
 ini_set("memory_limit","1400M");
 </pre>
     <p>
     If you want to reduce these memory requirements, it is advisable to also
-    reduce the values for some variables in the configs/config.php file.
+    reduce the values for some variables in the configs/config.php file.
     For instance, one might reduce the values of NUM_DOCS_PER_GENERATION,
     SEEN_URLS_BEFORE_UPDATE_SCHEDULER, NUM_URLS_QUEUE_RAM,
     MAX_FETCH_SIZE, and URL_FILTER_SIZE. Experimenting with these values
@@ -551,19 +564,19 @@ ini_set("memory_limit","1400M");
     <p><a href="#toc">Return to table of contents</a>.</p>
     <h2 id='installation'>Installation and Configuration</h2>
 <p>
-The Yioop! application can be obtained using the
+The Yioop application can be obtained using the
 <a href="http://www.seekquarry.com/?c=main&p=downloads">download page at
 seekquarry.com</a>.
-After downloading and unzipping it, move the Yioop! search engine into some
-folder under your web server's document root. Yioop! makes use of an auxiliary
-folder to store profile/crawl data. Before Yioop! will
+After downloading and unzipping it, move the Yioop search engine into some
+folder under your web server's document root. Yioop makes use of an auxiliary
+folder to store profile/crawl data. Before Yioop will
 run you must configure this directory. This can be done in one
-of two ways: either through the web interface (the preferred way), as we
-will now describe or using the configs/configure_tool.php script
+of two ways: either through the web interface (the preferred way), as we
+will now describe or using the configs/configure_tool.php script
 (which is harder, but might be suitable for some VPS settings) which will be
 described in the <a href="#commandline">command line tools section</a>.
 From the web interface, to configure this directory
-point your web browser to where your Yioop! folder is located, a
+point your web browser to where your Yioop folder is located, a
 configuration page should appear and let you set the
 path to the auxiliary folder (Search Engine Work Directory). This
 page looks like:
@@ -572,7 +585,7 @@ page looks like:
 <p>
 For this step, as a security precaution, you must connect via localhost. If you
 are in a web hosting environment (for example, if you are using cPanel
-to set up Yioop!) where it is difficult to connect using localhost, you can
+to set up Yioop) where it is difficult to connect using localhost, you can
 add a file, configs/local_config.php, with the following content:</p>
 <pre>
 &lt;?php
@@ -581,13 +594,13 @@ define('NO_LOCAL_CHECK', 'true');
 </pre>
 <p> Returning to our installation discussion, notice under the text field there
 is a heading "Component Check" and there is red text under it, this section is
-used to indicate any requirements that Yioop! has that might not be met yet on
+used to indicate any requirements that Yioop has that might not be met yet on
 your machine. In the case above, the web server needs permissions on the
 file configs/config.php to write in the value of the directory you choose in the
-form for the Work Directory. Another common message asks you to make sure the
+form for the Work Directory. Another common message asks you to make sure the
 web server has permissions on the place where this auxiliary
-folder needs to be created. When filling out the form of this page, on both
-*nix-like, and Windows machines, you should use forward slashes for the folder
+folder needs to be created. When filling out the form of this page, on both
+*nix-like, and Windows machines, you should use forward slashes for the folder
 location. For example,
 </p>
 <pre>
@@ -599,40 +612,40 @@ Once you have set the folder,
 you should see a second Profile Settings form beneath the Search Engine
 Work Directory
 form. If you are asked to sign-in before this, and you have not previously
-created accounts in this Work Directory, then the default account has login
-root, and an empty password. Once you see it, The Profile Settings form
+created accounts in this Work Directory, then the default account has login
+root, and an empty password. Once you see it, The Profile Settings form
 allows you to configure the debug, search access,
-database, queue server, and robot settings. It will look
+database, queue server, and robot settings. It will look
 something like:
 </p>
 <img src='resources/ConfigureScreenForm2.png' alt='The configure form'/>
 <p>The <b>Debug Display</b> field set has three check boxes: Error Info, Query
-Info, and Test Info. Checking Error Info will mean that when the Yioop!
+Info, and Test Info. Checking Error Info will mean that when the Yioop
 web app runs, any PHP Errors, Warnings, or Notices will be displayed
 on web pages. This is useful if you need to do debugging, but should not
 be set in a production environment. The second checkbox, Query Info, when
 checked, will cause statistics about the time, etc. of database queries
 to be displayed at the bottom of each web page. The last checkbox,
 Test Info, says whether or not to display automated tests of some of the
-systems library classes if the browser is navigated to
+systems library classes if the browser is navigated to
 http://YIOOP_INSTALLATION/tests/. None of these debug settings should
 be checked in a production environment.
 </p>
 <p>The <b>Search Access</b> field set has three check boxes:
 Web, RSS, and API. These control whether a user can use the
 web interface to get query results, whether RSS responses to queries
-are permitted, or whether or not the function based search API is
+are permitted, or whether or not the function based search API is
 available. Using the Web Search interface
 and formatting a query url to get an RSS response are
-describe in the <a href="#interface">Yioop! Search and User Interface
-section</a>. The Yioop! Search Function API is described in the
-section <a href="#embedding">Embedding Yioop!</a>, you can also look
+describe in the <a href="#interface">Yioop Search and User Interface
+section</a>. The Yioop Search Function API is described in the
+section <a href="#embedding">Embedding Yioop</a>, you can also look
 in the examples folder at the file search_api.php to see an example
-of how to use it. <b>If you intend to use Yioop!
+of how to use it. <b>If you intend to use Yioop
 in a configuration with multiple queue servers (not fetchers), then
 the RSS check box needs to be checked.</b></p>
 <p>The <b>Database Set-up</b> fieldset is used to specify what database
-management system should be used, how it should be connected to, and what
+management system should be used, how it should be connected to, and what
 user name and password should be used for the connection. At present sqlite2
 (called just sqlite), sqlite3, and Mysql databases are supported. The
 database is used to store information about what users are allowed to
@@ -642,68 +655,68 @@ an sqlite or sqlite3 database is being used then the connection is always
 a file on the current filesystem and there is no notion of login
 and password, so in this case only the name of the database is asked for.
 For sqlite, the database is stored in WORK_DIRECTORY/data. When switching
-database information, Yioop! checks first if a usable database with the user
+database information, Yioop checks first if a usable database with the user
 supplied data exists. If it does, then it uses it; otherwise, it tries to
-create a new database. Yioop! comes with a small sqlite demo database in the
+create a new database. Yioop comes with a small sqlite demo database in the
 data directory and this is used to populate the installation database in this
-case. This database has one account root with no password
+case. This database has one account root with no password
 which has privileges on all activities. Since different databases associated
-with a Yioop! installation might have different user accounts set-up after
+with a Yioop installation might have different user accounts set-up after
 changing database information you might have to sign in again.
 </p>
 <p>The <b>Search Page Elements and Links</b> fieldset is used to tell
-you which element and links you would like to have presented on the search
+you which element and links you would like to have presented on the search
 landing and search results pages. The Word Suggest check box controls whether
-a drop down of word suggestions should be presented by Yioop! when a user
+a drop down of word suggestions should be presented by Yioop when a user
 starts typing in the Search box. The Subsearch checkbox controls whether the
-links for Image, Video, and News search appear in the top bar of Yioop!
-You can actually configure what these links are in the
+links for Image, Video, and News search appear in the top bar of Yioop
+You can actually configure what these links are in the
 <a href="#sources">Search Sources</a>
 activity. The checkbox here is a global setting for displaying them or
 not. In addition, if this is unchecks then the hourly activity of
 downloading any RSS media sources for the News subsearch will be turned
-off. The Signin  checkbox controls whether to display the link to the page
-for users to sign in  to Yioop!  The Cache checkbox toggles whether a link to
-the cache of a search item should be displayed as part of each search result.
-The Similar checkbox toggles whether a link to similar search items should be
+off. The Signin  checkbox controls whether to display the link to the page
+for users to sign in  to Yioop  The Cache checkbox toggles whether a link to
+the cache of a search item should be displayed as part of each search result.
+The Similar checkbox toggles whether a link to similar search items should be
 displayed as part of each search result. The Inlinks checkbox toggles
 whether a link for inlinks to a search item should be displayed as part
 of each search result. Finally, the IP address checkbox toggles
 whether a link for pages with the same ip address should be displayed as part
 of each search result.</p>

-<p>The <b>Name Server Set-up</b> fieldset is used to tell Yioop! which machine
+<p>The <b>Name Server Set-up</b> fieldset is used to tell Yioop which machine
 is going to act as a name server during a crawl and what secret string
 to use to make sure that communication is being done between
 legitimate queue_servers and fetchers of your installation. You can
 choose anything for your secret string as long as you use the same
-string amongst all of the machines in your Yioop! installation.
-The reason why you have to set the name server url is that each machine that
-is going to run a fetcher to download web pages needs to know who the
-queue servers are so they can request a batch of urls to download. There are a
+string amongst all of the machines in your Yioop installation.
+The reason why you have to set the name server url is that each machine that
+is going to run a fetcher to download web pages needs to know who the
+queue servers are so they can request a batch of urls to download. There are a
 few different ways this can be set-up:
 </p>
 <ol>
-<li>If the particular instance of Yioop! is only being used to display
+<li>If the particular instance of Yioop is only being used to display
 search results from crawls that you have already done, then this
 fieldset can be filled in however you want.</li>
 <li>If you are doing crawling on only one machine, you can put
 http://localhost/path_to_yioop/ or
-http://127.0.0.1/path_to_yioop/, where you appropriately modify
+http://127.0.0.1/path_to_yioop/, where you appropriately modify
 "path_to_yioop".</li>
 <li>Otherwise, if you are doing a crawl on multiple machines, use
-the url of Yioop! on the machine that will act as the name server.</li>
+the url of Yioop on the machine that will act as the name server.</li>
 </ol>
-<p>In communicating between the fetcher and the server, Yioop! uses
+<p>In communicating between the fetcher and the server, Yioop uses
 curl. Curl can be particular about redirects in the case where posted
 data is involved. i.e., if a redirect happens, it does not send posted
-data to the redirected site. For this reason, Yioop! insists on a trailing
+data to the redirected site. For this reason, Yioop insists on a trailing
 slash on your queue server url. Beneath the Queue Server Url
 field, is a Memcached checkbox and a Filecache checkbox. Only one of these
 can be checked at a time. Checking the Memcached checkbox allows you to specify
-memcached servers that, if specified, will be used to cache in memory search
+memcached servers that, if specified, will be used to cache in memory search
 query results as well as index pages that have been accessed. Checking the
-Filecache box, tells Yioop! to cache search query results in temporary files.
+Filecache box, tells Yioop to cache search query results in temporary files.
 Memcached probably gives a better performance boost than Filecaching, but
 not all hosting environments have Memcached available.
 </p>
@@ -713,21 +726,21 @@ to provide websites that you crawl with information about who is crawling them.
 The field Crawl Robot Name is used to say the name of your robot. You should
 choose a common name for all of the fetchers in your set-up, but the name
 should be unique to your web-site. It is bad form to pretend to be someone
-else's robot, for example, the googlebot. As Yioop! crawls it sends the web-site
+else's robot, for example, the googlebot. As Yioop crawls it sends the web-site
 it crawls a User-Agent string, this string contains the url back to the bot.php
-file in the Yioop! folder. bot.php is supposed to provide a detailed description
-of your robot. The contents of textarea Robot Description is supposed to
+file in the Yioop folder. bot.php is supposed to provide a detailed description
+of your robot. The contents of textarea Robot Description is supposed to
 provide this description and is inserted between &lt;body&gt; &lt;/body&gt;
 tags on the bot.php page.
 </p>

 <p>
-After filling in all the fieldsets and submitting the form,
+After filling in all the fieldsets and submitting the form,
 the installation is complete.
 </p>
     <p><a href="#toc">Return to table of contents</a>.</p>
     <h2 id='files'>Summary of Files and Folders</h2>
-    <p>The Yioop! search engine consists of three main
+    <p>The Yioop search engine consists of three main
 scripts:</p>
 <dl>
 <dt>bin/fetcher.php</dt><dd>Used to download batches of urls provided
@@ -738,58 +751,58 @@ scripts:</p>
     Its last responsibility is to create the index_archive
     that is used by the search front end.</dd>
 <dt>index.php</dt><dd>Acts as the search engine web page. It is also used
-    to handle message passing between the fetchers
+    to handle message passing between the fetchers
     (multiple machines can act as fetchers) and the
     queue_server.</dd>
 </dl>
 <p>The file index.php is used when you browse to an installation
-of a Yioop! website. The description of how to use a Yioop! web site is
-given in the sections starting from the The Yioop! User Interface section.
+of a Yioop website. The description of how to use a Yioop web site is
+given in the sections starting from the The Yioop User Interface section.
 The files fetcher.php and queue_server.php are only connected with crawling
 the web. If one already has a stored crawl of the web, then you no longer
 need to run or use these programs. For instance, you might obtain a crawl of
 the web on your home machine and upload the crawl to a
-an instance of Yioop! on the ISP hosting your website. This website could
+an instance of Yioop on the ISP hosting your website. This website could
 serve search results without making use of either fetcher.php or
-queue_server.php. To perform a web crawl you need to use both
-of these programs however as well as the Yioop! web site. This is explained in
+queue_server.php. To perform a web crawl you need to use both
+of these programs however as well as the Yioop web site. This is explained in
 detail in the section Managing Crawls.
 </p>
-<p>The Yioop! folder itself consists of several files and sub-folders.
-The file index.php as mentioned above is the main entry point into the Yioop!
+<p>The Yioop folder itself consists of several files and sub-folders.
+The file index.php as mentioned above is the main entry point into the Yioop
 web application. yioopbar.xml is the xml file specifying how to access
 Yioop as an Open Search Plugin. favicon.ico is used to display the little
-icon in the url bar of a browser when someone browses to the Yioop! site.
-A URL to the file bot.php is given by the Yioop! robot
+icon in the url bar of a browser when someone browses to the Yioop site.
+A URL to the file bot.php is given by the Yioop robot
 as it crawls websites so that website owners can find out information
 about who is crawling their sites. Here is a rough guide to what
-the Yioop! folder's various sub-folders contain:
+the Yioop folder's various sub-folders contain:
 <dl>
-<dt>bin</dt><dd>This folder is intended to hold command-line scripts
-which are used in conjunction with Yioop! In addition to the fetcher.php
+<dt>bin</dt><dd>This folder is intended to hold command-line scripts
+which are used in conjunction with Yioop. In addition to the fetcher.php
 and queue_server.php script already mentioned, it contains arc_tool.php,
 mirror.php, and query_tool.php. arc_tool.php can be used to examine the contents
 of WebArchiveBundle's and IndexArchiveBundle's from the command line.
 mirror.php can be used if you would like to create a mirror/copy of a Yioop
-installation.  Finally, query_tool.php can be used to run queries
+installation.  Finally, query_tool.php can be used to run queries
 from the command-line.</dd>
 <dt>configs</dt><dd>This folder contains configuration files. You will
 probably not need to edit any of these files directly as you can set the most
-common configuration settings from with the admin panel of Yioop! The file
+common configuration settings from with the admin panel of Yioop. The file
 config.php controls a number of parameters about how data is stored, how,
-and how often, the queue_server and fetchers communicate, and which file types
-are supported by Yioop! configure_tool.php is a command-line tool which
-can perform some of the configurations needed to get a Yioop! installation
+and how often, the queue_server and fetchers communicate, and which file types
+are supported by Yioop configure_tool.php is a command-line tool which
+can perform some of the configurations needed to get a Yioop installation
 running. It is only necessary in some virtual private server settings --
-the prefered way to configure Yioop! is through the web interface.
+the prefered way to configure Yioop is through the web interface.
 createdb.php can be used to create a bare instance of
-the Yioop! database with a root admin user having no password. This script is
+the Yioop database with a root admin user having no password. This script is
 not strictly necessary as the database should be creatable via the admin panel;
 however, it can be useful if the database isn't working for some reason.
-Also, in the configs folder is the file default_crawl.ini. This file is
+Also, in the configs folder is the file default_crawl.ini. This file is
 copied to WORK_DIRECTORY after you set this folder in the admin/configure panel.
 There it is renamed as crawl.ini and serves as the initial set of sites to crawl
-until you decide to change these. The file token_tool.php is a tool which can
+until you decide to change these. The file token_tool.php is a tool which can
 be used to help in term extraction during crawls and for making trie's
 which can be used for word suggestions for a locale. To help word extraction
 this tool can generate in a locale folder (see below) a word gram bloom filter.
@@ -800,8 +813,8 @@ these word grams. For trie construction this tool can use a file that lists
 words one on a line.
 </dd>
 <dt>controllers</dt><dd>The controllers folder contains all the controller
-classes used by the web component of the Yioop! search engine. Most requests
-coming into Yioop! go through the top level index.php file. The query
+classes used by the web component of the Yioop search engine. Most requests
+coming into Yioop go through the top level index.php file. The query
 string (the component of the url after the ?) then says who is responsible
 for handling the request. In this query string there is a part which reads
 c= ... This says which controller should be used. The controller uses
@@ -809,74 +822,74 @@ the rest of the query string such as the arg= variable to determine
 which data must be retrieved from which models, and finally which view
 with what elements on it should be displayed back to the user.</dd>
 <dt>css</dt><dd>This folder contains the stylesheets used to control
-how web page tags should look for the Yioop! site when rendered in a
+how web page tags should look for the Yioop site when rendered in a
 browser</dd>
-<dt>data</dt><dd>This folder contains a default sqlite database for a new Yioop!
+<dt>data</dt><dd>This folder contains a default sqlite database for a new Yioop
 installation. Whenever the WORK_DIRECTORY is changed it is this database
 which is initially copied into the WORK_DIRECTORY to serve as the database
-of allowed users for the Yioop! system.</dd>
+of allowed users for the Yioop system.</dd>
 <dt>examples</dt><dd>This folder contains a file search_api.php
-whose code gives an example of how to use the Yioop! search function api.</dd>
+whose code gives an example of how to use the Yioop search function api.</dd>
 <dt>lib</dt><dd>This folder is short for library. It contains all the common
 classes for things like indexing, storing data to files, parsing urls, etc.
 lib contains six subfolders: <i>archive_bundle_iterators</i>,
-<i>compressors</i>, <i>index_bundle_iterators</i>, <i>indexing_plugins</i>,
-<i>processors</i>, and <i>stemmers</i>. The <i>archive_bundle_iterators</i>
-folder has iterators for iterating over the objects of various kinds of
-web archive file formats, such as arc, wiki-media, etc.
+<i>compressors</i>, <i>index_bundle_iterators</i>, <i>indexing_plugins</i>,
+<i>processors</i>, and <i>stemmers</i>. The <i>archive_bundle_iterators</i>
+folder has iterators for iterating over the objects of various kinds of
+web archive file formats, such as arc, wiki-media, etc.
 These iterators are used to iterate over such archives during
 a recrawl. The <i>compressors</i> folder contains classes that might be used
-to compress objects in a web_archive. The <i>index_bundle_iterator</i>
-folder contains a variety of iterators useful for iterating over lists of
-documents which might be returned during a query to the search engine.
+to compress objects in a web_archive. The <i>index_bundle_iterator</i>
+folder contains a variety of iterators useful for iterating over lists of
+documents which might be returned during a query to the search engine.
 The <i>processors</i> folder contains processors to extract page summaries for
 a variety of different mimetypes. The <i>stemmers</i> folder is where word
 stemmers for different languages would appear. Right now only an
 English porter stemmer is present in this folder.</dd>
 <dt>locale</dt><dd>This folder contains the default locale data which comes
-with the Yioop! system. A locale encapsulates data associated with a
-language and region. A locale is specified by an
-<a href='http://en.wikipedia.org/wiki/IANA_language_tag'>IETF language tag</a>.
+with the Yioop system. A locale encapsulates data associated with a
+language and region. A locale is specified by an
+<a href='http://en.wikipedia.org/wiki/IANA_language_tag'>IETF language tag</a>.
 So for instance, within the locale folder there is a folder en-US for the
 locale consisting of English in the United States. Within a given locale tag
 folder there is a file configure.ini which contains translations of
- string ids to string in the language of the locale. This approach is
- the same idea as used in <a
+ string ids to string in the language of the locale. This approach is
+ the same idea as used in <a
  href='http://en.wikipedia.org/wiki/Gettext'>Gettext</a> .po files.
 Yioop's approach does not require a compilation step nor a restart of the
 webserver for translations to appear. On the other hand, it is slower than the
 Gettext approach, but this could be easily mitigated using a memory cache such
-as <a href="http://memcached.org/">memcached</a> or <a
-href="http://php.net/manual/en/book.apc.php">apc</a>. Besides the file
-configure.ini, there is a statistics.txt file which has info about what
+as <a href="http://memcached.org/">memcached</a> or <a
+href="http://php.net/manual/en/book.apc.php">apc</a>. Besides the file
+configure.ini, there is a statistics.txt file which has info about what
 percentage of the id's have been translated. In addition to configure.ini and
 statistics.txt, the locale folder for a language contains two sub-folders:
 pages, containing static html (with extension .thtml) files which might need
 to be translated, and resources. The resources folder contains files:
 suggest-trie.txt.gz, a <a href="http://en.wikipedia.org/wiki/Trie"
 >Trie data structure</a> used for search bar word suggestions and tokenizer.php
-which either specifies the number of characters for this language to
+which either specifies the number of characters for this language to
 constitute a char gram or contains a stemmer class used to stem terms for
-this language. This folder might also contain a Bloom filter file with a name
+this language. This folder might also contain a Bloom filter file with a name
 like all_word_grams.ftr which would be used to do word gramming of sequences of
-words that should be treated as a unit, for example, "Honda Accord" or
+words that should be treated as a unit, for example, "Honda Accord" or
 "Bill Clinton".
 </dd>
 <dt>models</dt><dd>This folder contains the subclasses of Model used by
-Yioop! Models are used to encapsulate access to secondary storage.
+Yioop Models are used to encapsulate access to secondary storage.
 i.e., Accesses to databases or the filesystem. They are responsible
 for marshalling/de-marshalling objects that might be stored in more
 than one table or across serveral files. The models folder has
 within it a datasources folder. A datasource is an abstraction layer
 for the particular filesystem and database system that is being used
-by a Yioop! installation. At present, datasources have been defined
+by a Yioop installation. At present, datasources have been defined
 for sqlite, sqlite3, and mysql databases.</dd>
 <dt>resources</dt><dd>Used to store binary resources such as graphics, video,
-or audio. For now, just stores the Yioop! logo.</dd>
-<dt>scripts</dt><dd>This folder contains the Javascript files used by Yioop!
+or audio. For now, just stores the Yioop logo.</dd>
+<dt>scripts</dt><dd>This folder contains the Javascript files used by Yioop
 </dd>
 <dt>tests</dt><dd>This folder contains UnitTest's for various lib
-components. Yioop! comes with its own minimal UnitTest class which is
+components. Yioop comes with its own minimal UnitTest class which is
 defined in the lib/unit_test.php.</dd>
 <dt>views</dt><dd>This folder contains View subclasses as well
 as folders for elements, helpers, and layouts. A View is
@@ -886,28 +899,28 @@ responsible for communication between the fetchers and the queue_server
 output serialized objects. The elements folder contains Element classes which
 are typically used to output portions of web pages. For example, the
 html that allows one to choose an Activity in the Admin portion of the website
-is rendered by an ActivityElement. The helpers folder contains Helper
+is rendered by an ActivityElement. The helpers folder contains Helper
 subclasses. A Helper is used to automate the task of outputting certain
 kinds of web tags. For instance, the OptionsHelper when given an array
 can be used to output select tags and option tags using data from the array.
 The layout folder contains Layout subclasses. A Layout encapsulates the header
-and footer information for the kind of a document a View lives on. For example,
-web pages on the Yioop! site all use the WebLayout class as their Layout. The
-WebLayout class has a render  method for outputting the doctype, open html tag,
-head of the document including links for style sheets, etc. This method then
-calls the render methods of the current View, and finally outputs scripts and
+and footer information for the kind of a document a View lives on. For example,
+web pages on the Yioop site all use the WebLayout class as their Layout. The
+WebLayout class has a render  method for outputting the doctype, open html tag,
+head of the document including links for style sheets, etc. This method then
+calls the render methods of the current View, and finally outputs scripts and
 the necessary closing document tags.
 </dd>
 </dl>
-In addition, to the Yioop! application folder, Yioop! makes use of a
+In addition, to the Yioop application folder, Yioop makes use of a
 WORK DIRECTORY. The location of this directory is set during the configuration
-of a Yioop! installation. Yioop! stores crawls, and other data local
-to a particular Yioop! installation in files and folders in this directory.
-In the event that you upgrade your Yioop! installation you should only
-need to replace the Yioop! application folder and in the configuration
-process of Yioop! tell it where your WORK DIRECTORY is. Of course, it
+of a Yioop installation. Yioop stores crawls, and other data local
+to a particular Yioop installation in files and folders in this directory.
+In the event that you upgrade your Yioop installation you should only
+need to replace the Yioop application folder and in the configuration
+process of Yioop tell it where your WORK DIRECTORY is. Of course, it
 is always recommended to back up one's data before performing an upgrade.
-Within the WORK DIRECTORY, Yioop! stores four main files: profile.php,
+Within the WORK DIRECTORY, Yioop stores four main files: profile.php,
 crawl.ini, bot.txt, and robot_table.txt. Here is a rough guide to what
 the WORK DIRECTORY's sub-folder contain:
     </p>
@@ -915,25 +928,25 @@ the WORK DIRECTORY's sub-folder contain:
 <dt>app</dt><dd>This folder is used to contain your overrides to
 the views, controllers, models, resources, etc. For example, if you
 wanted to change how the search results were rendered, you could
-ass a views/search_view.php file to the app folder and Yioop! would use
-it rather than the one in the Yioop! base directory's views folder.
+ass a views/search_view.php file to the app folder and Yioop would use
+it rather than the one in the Yioop base directory's views folder.
 Using the app dir makes it easier to have customizations that won't get
-messed up when you upgrade Yioop!</dd>
+messed up when you upgrade Yioop.</dd>
 <dt>cache</dt><dd>The directory is used to store folders of the form
 ArchiveUNIX_TIMESTAMP, IndexDataUNIX_TIMESTAMP, and QueueBundleUNIX_TIMESTAMP.
 ArchiveUNIX_TIMESTAMP folders hold complete caches of web pages that have been
 crawled. These folders will appear on machines which are running fetcher.php.
 IndexDataUNIX_TIMESTAMP folders hold a word document index as well as summaries
 of pages crawled. A folder of this type is needed by the web app
-portion of Yioop! to serve search results. These folders can be moved from
+portion of Yioop to serve search results. These folders can be moved from
 machine to whichever machine you want to
 server results from. QueueBundleUNIX_TIMESTAMP folders are used to maintain
 the priority queue during the crawling process. The queue_server.php program
-is responsible for creating both IndexDataUNIX_TIMESTAMP and
+is responsible for creating both IndexDataUNIX_TIMESTAMP and
 QueueBundleUNIX_TIMESTAMP folders.</dd>
-<dt>data</dt><dd>If an sqlite or sqlite3 (rather than say MySQL) database is
-being used then a seek_quarry.db file is stored in the data folder. In Yioop!,
-the database is used to manage users, roles, locales, and crawls. Data for
+<dt>data</dt><dd>If an sqlite or sqlite3 (rather than say MySQL) database is
+being used then a seek_quarry.db file is stored in the data folder. In Yioop,
+the database is used to manage users, roles, locales, and crawls. Data for
 crawls themselves are NOT stored in the database.</dd>
 <dt>log</dt><dd>When the fetcher and queue_server are run as daemon processes
 log messages are written to log files in this folder. Log rotation is also done.
@@ -941,7 +954,7 @@ These log files can be opened in a text editor or console app.</dd>
 <dt>query</dt><dd>This folder is used to stored caches of already performed
 queries when file caching is being used.</dd>
 <dt>schedules</dt><dd>This folder has three kinds of subfolders:
-IndexDataUNIX_TIMESTAMP, RobotDataUNIX_TIMESTAMP, and
+IndexDataUNIX_TIMESTAMP, RobotDataUNIX_TIMESTAMP, and
 ScheduleDataUNIX_TIMESTAMP. When a fetcher communicates with the web app
 to say what it has just crawled, the web app writes data into these
 folders to be processed later by the queue_server. The UNIX_TIMESTAMP
@@ -963,7 +976,7 @@ like Facebook only allow big search engines like Google to crawl them.
 Still there are many links to Facebook, so Facebook on an open web crawl
 will appear, but with a somewhat confused summary based only on link text;
 the results editor allows one to give a meaningful summary for Facebook.</dd>
-<dt>temp</dt><dd>This is used for storing temporary files that Yioop!
+<dt>temp</dt><dd>This is used for storing temporary files that Yioop
 creates during the crawl process. For example, temporary files used while
 making thumbnails. Each fetcher has its own temp folder, so you might
 also see folders 0-temp, 1-temp, etc.</dd>
@@ -971,19 +984,30 @@ also see folders 0-temp, 1-temp, etc.</dd>
     <p><a href="#toc">Return to table of contents</a>.</p>


-    <h2 id='interface'>The Yioop! Search and User Interface</h2>
+    <h2 id='interface'>The Yioop Search and User Interface</h2>
 <p>
-The main search form for Yioop! looks like:
+The main search form for Yioop looks like:
 </p>
 <img src='resources/SearchScreen.png' alt='The Search form'/>
 <p>The HTML for this form is in views/search_views.php and the icon is stored
-in resources/yioop.png. You may want to modify these to incorporate Yioop!
+in resources/yioop.png. You may want to modify these to incorporate Yioop
 search into your site. For more general ways to modify the look of this pages,
-consult the <a href="#framework">Building a site using Yioop! documentation</a>.
-The Yioop! logo on any screen in the Yioop!
+consult the <a href="#framework">Building a site using Yioop documentation</a>.
+The Yioop logo on any screen in the Yioop
 interface is clickable and returns the user to the main search screen.
-One performs a search by typing a query into the search form field and
-clicking the Search button. The [More Statistics] link only shows if under the
+One performs a search by typing a query into the search form field and
+clicking the Search button. As one is typing, Yioop suggests possible queries,
+you can click, or use the up down arrows to select one of these suggestion
+to also perform a search</p>
+<img src='resources/Autosuggest.png' alt='Example suggestions as you type'
+width="70%"/>
+<p>For some non-Roman alphabet scripts such as Telugu you can enter
+words using how they sound using Roman letters and get suggestions
+in the script in question:</p>
+<img src='resources/TeluguAutosuggest.png' alt='Telugu suggestions for
+roman text' width="70%"/>
+<p>
+The [More Statistics] link only shows if under the
 Admin control panel you clicked on more statistics for the crawl. This link goes
 to a page showing many global statistics about the web crawl. Beneath
 this link are the Blog and Privacy links (as well as a link back to the
@@ -992,30 +1016,34 @@ through the Manage Locale activity. A typical search results might look like:
 </p>
 <img src='resources/SearchResults.png' alt='Example Search Results'
 width="70%"/>
+<p>If one slightly mistypes a query term, Yioop can sometimes suggest
+a spelling correction:</p>
+<img src='resources/SearchSpellCorrect.png' alt='Example Search Results
+with a spelling correction' width="70%"/>
 <p>Each result back from the query consists of several parts:
-First comes a title, which is a link to the page that matches the query term.
-This is followed by a brief summary of that page with the query words in bold.
+First comes a title, which is a link to the page that matches the query term.
+This is followed by a brief summary of that page with the query words in bold.
 Then the document rank, relevancy,
-proximity, and overall scores are listed. Each of these numbers
-is a grouped statistic -- several "micro index entry" are grouped
+proximity, and overall scores are listed. Each of these numbers
+is a grouped statistic -- several "micro index entry" are grouped
 together/summed to create each. So even though
 a given "micro index entry" might have a document rank between 1 and 10 there
-sum could be a larger value. Further, the overall score is a
-generalized inner product of the scores of the "micro index entries",
-so the separated scores will not typically sum to the overall score.
+sum could be a larger value. Further, the overall score is a
+generalized inner product of the scores of the "micro index entries",
+so the separated scores will not typically sum to the overall score.
 After these scores there are three links:
 Cached, Similar, and Inlinks. Clicking on Cached will display Yioop's downloaded
 copy of the page in question. We will describe this in more detail
-in a moment. Clicking on Similar causes Yioop! to locate the five
+in a moment. Clicking on Similar causes Yioop to locate the five
 words with the highest relevancy scores for that document and then to perform
 a search on those words. Clicking on Inlinks will take you to a page
-consisting of all the links that Yioop! found to the document in question.
-Finally, clicking on an IP address link returns all documents that were
+consisting of all the links that Yioop found to the document in question.
+Finally, clicking on an IP address link returns all documents that were
 crawled from that IP address.</p>
 <img src='resources/Cache.png' alt='Example Cache Results'
 width="70%"/>
 <p>As the above illustrates, on a cache link click,
-Yioop! will list the time of download and highlight
+Yioop will list the time of download and highlight
 the query terms. It should be noted that cached copies of web pages are
 stored on the fetcher which originally downloaded the page. The IndexArchive
 associated with a crawl is stored on the queue server and can be moved
@@ -1029,8 +1057,8 @@ can be viewed as an "SEO" view of the page.</p>
 <img src='resources/CacheSEO.png' alt='Example Cache SEO Results'
 width="70%"/>
 <p>In addition, to a straightforward web search, one can also do image,
-video, news searches by clicking on the Images, Video, or News links in
-the top bar of Yioop search pages. Below are some examples of what these look
+video, news searches by clicking on the Images, Video, or News links in
+the top bar of Yioop search pages. Below are some examples of what these look
 like for a search on "Obama":</p>
 <img src='resources/ImageSearch.png' alt='Example Image Search Results'
 width="70%"/>
@@ -1038,7 +1066,7 @@ width="70%"/>
 width="70%"/>
 <img src='resources/NewsSearch.png' alt='Example News Search Results'
 width="70%"/>
-<p>When Yioop! crawls a page it adds one of the following meta
+<p>When Yioop crawls a page it adds one of the following meta
 words to the page media:text, media:image, or media:video. RSS feed
 sources that have been added to Media Sources under the <a href="#sources"
 >Search Sources</a>
@@ -1046,20 +1074,20 @@ activity are downloaded from each hour. Each RSS item on such a downloaded
 pages has the meta word media:news added to it. A usual
 web search just takes the search terms provided to perform a search.
 An Images, Video, News search tacks on to the search terms, media:image or
-media:video, or media:news. Detection of images is done via mimetype at
-initial page download time. At this time a thumbnail is generated. When search
-results are presented it is this cached thumbnail that is shown. So image
-search does not leak information to third party sites. On any search results
-page with images, Yioop! tries to group the images into a thumbnail strip. This
-is true of both normal and images search result pages. In the case of image
-search result pages, except for not-yet-downloaded pages, this results in
-almost all of the results being the thumbnail strip. Video page detection is
-not done through mimetype as popular sites like YouTube, Vimeo, and others
-vary in how they use Flash or video tags to embed video on a web page. Yioop!
+media:video, or media:news. Detection of images is done via mimetype at
+initial page download time. At this time a thumbnail is generated. When search
+results are presented it is this cached thumbnail that is shown. So image
+search does not leak information to third party sites. On any search results
+page with images, Yioop tries to group the images into a thumbnail strip. This
+is true of both normal and images search result pages. In the case of image
+search result pages, except for not-yet-downloaded pages, this results in
+almost all of the results being the thumbnail strip. Video page detection is
+not done through mimetype as popular sites like YouTube, Vimeo, and others
+vary in how they use Flash or video tags to embed video on a web page. Yioop
 uses the Video Media sources that have been added in the Search Sources
-activity to detect whether a link is in the format of a video page. To get
+activity to detect whether a link is in the format of a video page. To get
 a thumbnail for the video it again uses the method for rewriting the video
-url to an image link specified for the particular site in question in
+url to an image link specified for the particular site in question in
 Search Sources. i.e., the thumbnail will be downloaded from the orginal site.
 <b>This could leak information to third party sites about your search.</b>.
 </p>
@@ -1076,40 +1104,40 @@ to a page of search results listing all articles from that media source.
 For instance, if one were to click on the Yahoo News text above
 one would go to results for all Yahoo News articles. This is equivalent
 to doing a search on: media:news:Yahoo+News . If one clicks on the News
-subsearch, not having specified a query yet, then all stored
+subsearch, not having specified a query yet, then all stored
 news items in the current language will be displayed, roughly ranked by
 recentness. If one has RSS media sources of which are set to be from
 different locales, then this will be taken into account on this blank query
 News page.</p>
 <p>Turning now to the topic of how to enter a query in Yioop:
-A basic query to the Yioop! search form is typically a sequence of
-words seperated by whitespace. This will cause Yioop! to compute a
+A basic query to the Yioop search form is typically a sequence of
+words seperated by whitespace. This will cause Yioop to compute a
 "conjunctive query", it will look up only those documents which contain all of
-the terms listed. Yioop! also supports a variety of other search box
+the terms listed. Yioop also supports a variety of other search box
 commands and query types:</p>
 <ul>
-<li><b>#<em>num</em>#</b> in a query are treated as query presentation markers.
+<li><b>#<em>num</em>#</b> in a query are treated as query presentation markers.
 When a query is first parsed, it is split into columns based with #<em>num</em>#
 as the column boundary. For example, bob #2# bob sally #3# sally #1#.
-A given column is used to present <em>num</em> results, where <em>num</em> is
-what is between the hash marks immediately after it. So in the query above,
-the subquery <em>bob</em> is used for the first two search results, then the
+A given column is used to present <em>num</em> results, where <em>num</em> is
+what is between the hash marks immediately after it. So in the query above,
+the subquery <em>bob</em> is used for the first two search results, then the
 subquery <em>bob sally</em> is used for the next three results, finally the last
-column is always used for any remaining results. In this case,
+column is always used for any remaining results. In this case,
 the subquery <em>sally</em> would be used for all remaining results even though
-its <em>num</em> is 1. If a query does not have any #<em>num</em>#'s it is
+its <em>num</em> is 1. If a query does not have any #<em>num</em>#'s it is
 assumed that it has only one column.
 </li>
 <li>Separating query terms with a vertical bar | results in a disjunctive
-query. These are parsed for after the presentation markers above.
+query. These are parsed for after the presentation markers above.
 So a search on: <em>Chris | Pollett</em> would return pages that have
 either the word Chris or the word Pollett or both.</li>
 <li>Putting the query in quotes, for example "Chris Pollett", will cause
-Yioop! to perform an exact match search. Yioop! in this case would only
+Yioop to perform an exact match search. Yioop in this case would only
 return documents that have the string "Chris Pollett" rather than just
 the words Chris and Pollett possibly not next to each other in the document.
-Also, using the quote syntax, you can perform searches such as
-"Chris * Homepage" which would return documents which have the word Chris
+Also, using the quote syntax, you can perform searches such as
+"Chris * Homepage" which would return documents which have the word Chris
 followed by some text followed by the word Homepage.
 </li>
 <li>If the query has at least one word not prefixed by -, then adding
@@ -1117,8 +1145,8 @@ a `-' in front of a word in a query means search for results not containing
 that term. So a search on: <em>of -the</em> would return results containing
 the word "of" but not containing the word "the".</li>
 <li>Searches of the forms: <b>related:url</b>, <b>cache:url</b>,
-<b>link:url</b>, <b>ip:ip_address</b> are equivalent to having clicked on the
-Similar, Cached, InLinks, IP address links, respectively, on a summary with
+<b>link:url</b>, <b>ip:ip_address</b> are equivalent to having clicked on the
+Similar, Cached, InLinks, IP address links, respectively, on a summary with
 that url and ip address.</li>
 </ul>
 <p>The remaining query types we list in alphabetical order:</p>
@@ -1145,12 +1173,12 @@ where the path was /).</li>
 <li><b>index:timestamp</b> or <b>i:timestamp</b> causes the search to
 make use of the IndexArchive with the given timestamp. So a search like:
 <em>Chris Pollett i:1283121141 | Chris Pollett</em>
-take results from the index with timestamp 1283121141 for
-Chris Pollett and unions them with results for Chris Pollett in the default
+take results from the index with timestamp 1283121141 for
+Chris Pollett and unions them with results for Chris Pollett in the default
 index</li>
 <li><b>if:keyword!add_keywords_on_true!add_keywords_on_false</b> checks the
 current conjunctive query clause for "keyword"; if present, it adds
-"add_keywords_on_true" to the clause, else it adds the keywords
+"add_keywords_on_true" to the clause, else it adds the keywords
 "add_keywords_on_true".  This meta word is typically used as part of a
 crawl mix. The else condition does not need to be present. As an example,
 <em>if:oracle!info:http://oracle.com/!site:none</em> might be added to
@@ -1160,7 +1188,7 @@ of a larger crawl mix this could be used to make oracle's homepage appear
 at the top of the query results. If you would like to inject multiple
 keywords then separate the keywords using plus rather than white space.
 For example, <i>if:corvette!fast+car</i>.</li>
-<li><b>info:url</b> returns the summary in the Yioop! index for the given url
+<li><b>info:url</b> returns the summary in the Yioop index for the given url
 only. For example, one could type info:http://www.yahoo.com/ or
 info:www.yahoo.com to get the summary for just the main Yahoo! page. This
 is useful for checking if a particular page is in the index.
@@ -1169,26 +1197,26 @@ is useful for checking if a particular page is in the index.
 whose language can be determined to match the given language tag.
 For example, <i>lang:en-US</i>.</li>
 <li><b>media:kind</b> returns summaries of all documents found
-of the given media kind. Currently, the text, image, news, and video are
+of the given media kind. Currently, the text, image, news, and video are
 the four supported media kinds. So one can add to the
-search terms <em>media:image</em> to get only image results matching
+search terms <em>media:image</em> to get only image results matching
 the query keywords.</li>
-<li><b>mix:name</b> or <b>m:name</b> tells Yioop! to use the crawl mix "name"
+<li><b>mix:name</b> or <b>m:name</b> tells Yioop to use the crawl mix "name"
 when computing the results of the query. The section on mixing crawl indexes has
 more details about crawl mixes. If the name of the original mix had spaces,
 for example, <i>cool mix</i> then to use the mix you would need to replace
 the spaces with plusses, <i>m:cool+mix</i>.</li>
-<li><b>modified:Y</b>, <b>modified:Y-M</b>, <b>modified:Y-M-D</b>
+<li><b>modified:Y</b>, <b>modified:Y-M</b>, <b>modified:Y-M-D</b>
 returns summaries of all documents which were last modified on the given date.
 For example, <i>modified:2010-02</i> returns all document which were last
 modifed in February, 2010.</li>
-<li><b>no:some_command</b> is used to tell Yioop! not to perform some
+<li><b>no:some_command</b> is used to tell Yioop not to perform some
 default transformation of the search terms. For example, <i>no:guess</i>
-tells Yioop! not to try to guess the semantics of the search before
-doing the search. This would mean for instance, that Yioop! would not
+tells Yioop not to try to guess the semantics of the search before
+doing the search. This would mean for instance, that Yioop would not
 rewrite the query <i>yahoo.com</i> into <i>site:yahoo.com</i>.
-<i>no:network</i> tells Yioop! to only return search results from the
-current machine and not to send the query to all machines in the Yioop!
+<i>no:network</i> tells Yioop to only return search results from the
+current machine and not to send the query to all machines in the Yioop
 instance. <i>no:cache</i> says to recompute the query and not to make
 use of memcache or file cache.</li>
 <li><b>numlinks:some_number</b> returns summaries of all documents
@@ -1204,40 +1232,40 @@ path:/robots.txt would return summaries for all robots.txt files.</li>
 that user_agent_name (after lower casing). For example, <i>robot:yioopbot</i>
 would return all robots.txt pages explicitly having a rule for YioopBot.</li>
 <li><b>safe:boolean_value</b> is used provide "safe" or "unsafe"
-search results. Yioop! has a crude, "hand-tuned", linear classifier for
+search results. Yioop has a crude, "hand-tuned", linear classifier for
 whether a site contains pornographic content. If one adds safe:true to
 a search, only those pages found which were deemed non-pornographic will
 be returned. Adding safe:false has the opposite effect.</li>
 <li><b>server:web_server_name</b> returns summaries of all documents
 served on that kind of web server. For example, <i>server:apache</i>.</li>
-<li><b>site:url</b>, <b>site:host</b>, or <b>site:domain</b> returns all of
+<li><b>site:url</b>, <b>site:host</b>, or <b>site:domain</b> returns all of
 the summaries of pages found at that url, host, or domain. As an example,
 <em>site:http://prints.ucanbuyart.com/lithograph_art.html</em>,
-<em>site:http://prints.ucanbuyart.com/</em>,
+<em>site:http://prints.ucanbuyart.com/</em>,
 <em>site:prints.ucanbuyart.com</em>, <em>site:.ucanbuyart.com</em>,
 <em>site:ucanbuyart.com</em>, <em>site:com</em>, will all returns with
-decreasing specificity. To return all pages listed in a Yioop! index you can
+decreasing specificity. To return all pages listed in a Yioop index you can
 do <i>site:all</i>.
 </li>
 <li><b>size:num_bytes</b> returns summaries of all documents whose download
 size was between num_bytes and num_bytes + 5000. num_bytes must be a multiple
 of 5000. For example, size:15000.</li>
 <li><b>time:num_seconds</b> returns summaries of all documents whose download
-time excluding DNS lookup time was between num_seconds and num_seconds + 0.5
+time excluding DNS lookup time was between num_seconds and num_seconds + 0.5
 seconds. For example, time:1.5.</li>
 <li><b>version:version_number</b> returns summaries of all documents
-served on web servers with the given version number.
+served on web servers with the given version number.
 For example, one might have a query <i>server:apache version:2.2.9</i>.</li>
 <li><b>weight:some_number</b> or <b>w:some_number</b> has the effect of
 multiplying all score for this portion of a query by some_number. For example,
 <em>Chris Pollett | Chris Pollett site:wikipedia.org w:5</em>
-would  multiply scores satisfying Chris Pollett  and on wikipedia.org by
+would  multiply scores satisfying Chris Pollett  and on wikipedia.org by
 5 and union these with those satisfying Chris Pollett
 </li>

 </ul>
-<p>In addition, to using the search form interface to query Yioop! it is also
-possible to query Yioop! and get results in Open Search RSS format. To
+<p>In addition, to using the search form interface to query Yioop it is also
+possible to query Yioop and get results in Open Search RSS format. To
 do that you can either directly type a URL into your browser of the form:</p>
 <pre>
 http://my-yioop-instance-host/?f=rss&amp;q=query+terms
@@ -1250,49 +1278,49 @@ the corner of the page with the main search form is a Settings-Signin element:
 <p>
 This element provides access for a user to change their search settings
 by clicking Settings. The Sign In link provides access to the Admin panel for
-the website.
+the website.
 </p>
 <img src='resources/Settings.png' alt='The Settings Form'/>
 <p>On the Settings page, there are currently three items which can be adjusted:
-The number of results per page when doing a search, the language Yioop! should
-use, and the particular search index Yioop! should use. When a user clicks
-save, the data is stored by Yioop! The user can then click "Return to Yioop!"
-to go back the search page. Thereafter, interaction with Yioop! will make
-use of any settings' changes. Data is stored in Yioop! and associated with
-a given user via a cookies mechanism. In order for this to work, the
+The number of results per page when doing a search, the language Yioop should
+use, and the particular search index Yioop should use. When a user clicks
+save, the data is stored by Yioop. The user can then click "Return to Yioop"
+to go back the search page. Thereafter, interaction with Yioop will make
+use of any settings' changes. Data is stored in Yioop and associated with
+a given user via a cookies mechanism. In order for this to work, the
 user's browser must allow cookies to be set. This is usually the default
-for most browsers; however, it can sometimes be disabled in which case the
-browser option must be changed back to the default for Settings to work
+for most browsers; however, it can sometimes be disabled in which case the
+browser option must be changed back to the default for Settings to work
 correctly. It is possible to control some of these settings by tacking on
-stuff to the URL. For instance, adding &l=fr-FR to the URL query string
+stuff to the URL. For instance, adding &l=fr-FR to the URL query string
 (the portion of the URL after the question mark) would
-tell Yioop! to use the French from France for outputting
+tell Yioop to use the French from France for outputting
 text. You can also add &its= the Unix
 timestamp of the search index you want.
 </p>
-<p>Clicking on the Sign In link on the corner of the Yioop! web site will
+<p>Clicking on the Sign In link on the corner of the Yioop web site will
 bring up the following form:
 </p>
 <img src='resources/SigninScreen.png' alt='Admin Panel Login'/>
 <p>
 Correctly, entering a username and password will then bring the user to the
-Admin portion of the Yioop! website. Each Admin page has on it an Activity
+Admin portion of the Yioop website. Each Admin page has on it an Activity
 element as well as a main panel where the current activity is displayed.
 The Activity element allows the user to choose what is the current activity
-for this Admin session. The choices available on the Activity element
+for this Admin session. The choices available on the Activity element
 depend on the privileges the user has.
 Currently, for the root account, the Activity element looks like:
 </p>
 <img src='resources/ActivityElement.png' alt='The Activity Element'/>
 <p>
-Over the next several sections we will discuss each of the Yioop! admin
+Over the next several sections we will discuss each of the Yioop admin
 activities in turn. Before we do that we make a couple remarks about using
-Yioop! from a mobile device.
+Yioop from a mobile device.
 </p>
-    <h2 id='mobile'>Yioop! Mobile Interface</h2>
-    <p>Yioop!'s user interface is designed to display reasonably well as is
+    <h2 id='mobile'>Yioop Mobile Interface</h2>
+    <p>Yioop's user interface is designed to display reasonably well as is
     in table devices such as the iPad. For smart phones, such as
-    iPhone, Android, Blackberry, or Windows Phone, Yioop! has a separate
+    iPhone, Android, Blackberry, or Windows Phone, Yioop has a separate
     user interface. For search, settings, and login, this looks fairly
     similar to the non-mobile user interface:</p>
 <img src='resources/MobileSearch.png' alt='Mobile Search Landing Page'
@@ -1306,16 +1334,16 @@ Yioop! from a mobile device.
     with a drop-down:</p>
 <img src='resources/MobileAdmin.png' alt='Example Mobile Admin Activity'
     style="width:280px;height:280px"/>
-    <p>We now resume our discussion of how to use each of the Yioop! admin
+    <p>We now resume our discussion of how to use each of the Yioop admin
     activities for the default, non-mobile, setting, simply noting that
     except for the above minor changes, these instructions will also apply to
     the mobile setting.
     </p>
     <h2 id='passwords'>Managing Accounts</h2>
-    <p>By default, when a user first signs in to the Yioop! admin
+    <p>By default, when a user first signs in to the Yioop admin
     panel the current activity is the Manage Account activity. For now,
     this activity just lets user's change their password using the form
-    pictured below. The intention is as Yioop! development continues
+    pictured below. The intention is as Yioop development continues
     additional features might be configured in this activity.</p>
 <img src='resources/ChangePassword.png' alt='Change Password Form'/>

@@ -1326,9 +1354,9 @@ Yioop! from a mobile device.
 <img src='resources/ManageUser.png' alt='The Manage User form'/>
     <p>As one can see this activity has three forms associated with it.
     The first form can be used to add a new user with a given password
-    to the Yioop! system. The second form allows existing users to be deleted.
-    The last form allows one to add roles to or delete roles from an existing
-    user. Here the word "role" means a set of activities.
+    to the Yioop system. The second form allows existing users to be deleted.
+    The last form allows one to add roles to or delete roles from an existing
+    user. Here the word "role" means a set of activities.
     Adding a role to an user means allows that
     user when signed in to the admin panel can access that activity. Roles
     are managed through the Manage Role activity, which looks like:</p>
@@ -1343,7 +1371,7 @@ Yioop! from a mobile device.


     <h2 id='crawls'>Managing Crawls</h2>
-    <p>The Manage Crawl activity in Yioop! looks like:</p>
+    <p>The Manage Crawl activity in Yioop looks like:</p>
 <img src='resources/ManageCrawl.png' alt='Manage Crawl Form'/>
     <p>
     This activity will actually list slightly different kinds of peak memory
@@ -1358,8 +1386,8 @@ Yioop! from a mobile device.
     the crawl as well as a Stop Crawl button. Crawling continues until this
     Stop Crawl button is pressed or until no new sites can be found. As a
     crawl occurs, a sequence of IndexShard's are written. These keep track
-    of which words appear in which documents for groups of 50,000 or so
-    documents. In addition an IndexDictionary of which words appear in which
+    of which words appear in which documents for groups of 50,000 or so
+    documents. In addition an IndexDictionary of which words appear in which
     shard is written to a separate folder and subfolders. When the Stop button
     is clicked the "tiers" of data in this dictionary need to be logarithmically
     merged, this process can take a couple of minutes, so after clicking stop
@@ -1368,31 +1396,31 @@ Yioop! from a mobile device.
     this stop button line, is a link which allows you to change the
     crawl options of the currently active crawl. Changing the options on
     an active crawl may take some time to fully take effect as the currently
-    processing queue of urls needs to flush.
+    processing queue of urls needs to flush.
     At the bottom of the page is a table listing previously run crawls.
-    Next to each previously run crawl are three links. The first link lets you
-    resume this crawl, if this is possible, and say Closed otherwise.
-    Resume will cause Yioop! to look for unprocessed fetcher
-    data regarding that crawl, and try to load that into a fresh priority
+    Next to each previously run crawl are three links. The first link lets you
+    resume this crawl, if this is possible, and say Closed otherwise.
+    Resume will cause Yioop to look for unprocessed fetcher
+    data regarding that crawl, and try to load that into a fresh priority
     queue of to crawl urls. If it can do this, crawling would continue.
-    The second link let's you set this crawl's result as the default index.
+    The second link let's you set this crawl's result as the default index.
     In the above picture there were only two saved crawls, the second of which
-    was set as the default index. When someone comes to your Yioop!
-    installation and does not adjust their settings, the default index is
-    used to compute search results. The final link allows one to Delete the
-    crawl. For both resuming a crawl and deleting a crawl, it might take a
-    little while before you see the process being reflected in the display.
-    This is because communication might need to be done with the various
-    fetchers, and because the on screen display refreshes only every 20 seconds
+    was set as the default index. When someone comes to your Yioop
+    installation and does not adjust their settings, the default index is
+    used to compute search results. The final link allows one to Delete the
+    crawl. For both resuming a crawl and deleting a crawl, it might take a
+    little while before you see the process being reflected in the display.
+    This is because communication might need to be done with the various
+    fetchers, and because the on screen display refreshes only every 20 seconds
     or so.
     </p>
     <h3 id="prereqs">Prerequisites for Crawling</h3>
     <p>Before you can start a new crawl, you need to run at least one
     queue_server.php script and you need to run at least one fetcher.php script.
-    These can be run either from the same Yioop! installation or from
-    separate machines or folder with Yioop! installed. Each installation of
-    Yioop! that is going to participate in a crawl should be configured with the
-    same name server and server key. Running these scripts can be done either
+    These can be run either from the same Yioop installation or from
+    separate machines or folder with Yioop installed. Each installation of
+    Yioop that is going to participate in a crawl should be configured with the
+    same name server and server key. Running these scripts can be done either
     via the command line or through a web interface. As described in the
     <a href="#requirements">Requirements</a> section you might need to do some
     additional initial set up if you want to take the web interface approach.
@@ -1400,12 +1428,12 @@ Yioop! from a mobile device.
     only one queue server. You can still have more than one fetcher, but
     the crawl speed in this case probably won't go faster after ten to
     twelve fetchers. Also, in the command-line approach the queue server and
-    name server should be the same instance of Yioop! In the remainder of this
-    section we describe how to start the queue_server.php and
+    name server should be the same instance of Yioop. In the remainder of this
+    section we describe how to start the queue_server.php and
     fetcher.php scripts via the command line; the <a href="#machines"
     >GUI for Managing Machines and Servers</a> section describes how to do
-    it via a web interface. To begin open a
-    command shell and cd into the bin subfolder of the Yioop! folder. To
+    it via a web interface. To begin open a
+    command shell and cd into the bin subfolder of the Yioop folder. To
     start a queue_server type:</p>
     <pre>
 php queue_server.php terminal</pre>
@@ -1453,24 +1481,24 @@ php fetcher.php stop</pre>
     necessary to perform a crawl we now return to how to set the
     options for how the crawl is conducted.</p>
     <h3>Common Crawl and Search Configurations</h3>
-    <p>When testing Yioop!, it is quite common just to have one instance
+    <p>When testing Yioop, it is quite common just to have one instance
     of the fetcher and one instance of the queue_server running, both on
-    the same machine and same installation of Yioop! In this subsection
+    the same machine and same installation of Yioop. In this subsection
     we wish to briefly describe some
     other configurations which are possible and also some configs/config.php
     configurations that can affect the crawl and search speed. The most obvious
-    config.php setting which can affect the crawl speed is
+    config.php setting which can affect the crawl speed is
     NUM_MULTI_CURL_PAGES. A fetcher when performing downloads, opens this
     many simultaneous connections, gets the pages corresponding to them,
-    processes them, then proceeds to download the next batch of
-    NUM_MULTI_CURL_PAGES pages. Yioop! uses the fact that there are gaps
+    processes them, then proceeds to download the next batch of
+    NUM_MULTI_CURL_PAGES pages. Yioop uses the fact that there are gaps
     in this loop where no downloading is being done to ensure robots.txt
     Crawl-delay directives are being honored (a Crawl-delayed host will
     only be scheduled to at most one fetcher at a time). The downside of this
     is that your internet connection might not be used to its fullest ability
     to download pages. Thus, it can make sense rather than increasing
-    NUM_MULTI_CURL_PAGES, to run multiple copies of the Yioop! fetcher on a
-    machine. To do this one can either install the Yioop! software multiple
+    NUM_MULTI_CURL_PAGES, to run multiple copies of the Yioop fetcher on a
+    machine. To do this one can either install the Yioop software multiple
     times or give an instance number when one starts a fetcher. For example:</p>
 <pre >
 php fetcher.php start 5
@@ -1483,70 +1511,70 @@ php fetcher.php start 5
     the data for the hosts that queue server crawled. Putting the WORK_DIRECTORY
     on a solid-state drive can, as you might expect, greatly speed-up how fast
     search results will be served. Unfortunately, if a given queue server
-    is storing ten million or so pages, the corresponding
-    IndexDataUNIX_TIMESTAMP folder might be around 200 GB. Two main sub-folders
+    is storing ten million or so pages, the corresponding
+    IndexDataUNIX_TIMESTAMP folder might be around 200 GB. Two main sub-folders
     of IndexDataUNIX_TIMESTAMP largely determine the search performance of
-    Yioop! handling queries from a crawl. These are the dictionary subfolder
+    Yioop handling queries from a crawl. These are the dictionary subfolder
     and the posting_doc_shards subfolder, where the former has the greater
-    influence. For the ten million page situation these might be 5GB and 30GB
+    influence. For the ten million page situation these might be 5GB and 30GB
     respectively. It is completely possible to copy these subfolders to
     a SSD and use symlinks to them under the original crawl directory to
-    enhance Yioop!'s search performance.</p>
+    enhance Yioop's search performance.</p>
     <h3>Specifying Crawl Options and Modifying Options of the Active Crawl</h3>
     <p>As we pointed out above, next to the Start Crawl button is an Options
     link. Clicking on this link, let's you set various aspect of how
     the next crawl should be conducted. As we mentioned before, if there is
     a currently processing crawl there will be an options link under its stop
     button. Both of these links lead to similar pages, however, for an active
-    crawl fewer parameters can be changed. So we will only describe the first
+    crawl fewer parameters can be changed. So we will only describe the first
     link. We do mention here though that under the active crawl options page
     it is possible to inject new seed urls into the crawl as it is progressing.
     In the case of clicking the Option
-    link next to the start button, the user should be taken to an
+    link next to the start button, the user should be taken to an
     activity screen which looks like:</p>
 <img src='resources/WebCrawlOptions.png' alt='Web Crawl Options Form'/>
     <p>The Back link in the corner returns one to the previous activity.</p>
-    <p>There are two kinds of crawls that can be performed by Yioop!
+    <p>There are two kinds of crawls that can be performed by Yioop
     either a crawl of sites on the web or a crawl of data that has been
-    previously stored in a supported archive format such as data that was
-    crawled by Versions 0.66 and above of Yioop!,
+    previously stored in a supported archive format such as data that was
+    crawled by Versions 0.66 and above of Yioop,
     <a href="http://www.archive.org/web/researcher/ArcFileFormat.php">Internet
-    Archive arc file</a>,
+    Archive arc file</a>,
     <a href="http://en.wikipedia.org/wiki/Wikipedia:Database_download"
-    >MediaWiki xml dump</a>, and
+    >MediaWiki xml dump</a>, and
     <a href="http://rdf.dmoz.org/"
     >Open Directory Project RDF file</a>. We will first concentrate on
     new web crawls and then return to archive crawls later.</p>
     <h4>Web Crawl Options</h4>
     <p>
-    On the web crawl tab, the first form field, "Get Crawl Options From",
+    On the web crawl tab, the first form field, "Get Crawl Options From",
     allows one to read in crawl options either from the default_crawl.ini file
-    or from the crawl options used in a previous crawl. The rest of the form
-    allows the user to change the existing crawl options. The second form field
-    is labeled Crawl Order. This can be set to either Bread First or Page
-    Importance. It specifies the order in which pages will be crawled. In
-    breadth first crawling, roughly all the seeds sites are visited first,
-    followed by sites linked directly from seed sites, followed by sites linked
+    or from the crawl options used in a previous crawl. The rest of the form
+    allows the user to change the existing crawl options. The second form field
+    is labeled Crawl Order. This can be set to either Bread First or Page
+    Importance. It specifies the order in which pages will be crawled. In
+    breadth first crawling, roughly all the seeds sites are visited first,
+    followed by sites linked directly from seed sites, followed by sites linked
     directly from sites linked directly from seed sites, etc. Page Importance is
     our modification of [<a href="#APC2003">APC2003</a>]. In this
     order, each seed sites starts with a certain quantity of money.
     When a site is crawled it distributes its money equally amongst sites
     it links to. When picking sites to crawl next, one chooses those that
-    currently have the most money. Additional rules are added to handle things
-    like the fact that some sites might have no outgoing links. Also, in our
-    set-up we don't revisit already seen sites. To handle these situation we
-    take a different tack from the original paper. This crawl order roughly
+    currently have the most money. Additional rules are added to handle things
+    like the fact that some sites might have no outgoing links. Also, in our
+    set-up we don't revisit already seen sites. To handle these situation we
+    take a different tack from the original paper. This crawl order roughly
     approximates crawling according to page rank.</p>
     <p>The next checkbox is labelled Restrict Sites by Url. If it
     is checked then a textarea with label Allowed To Crawl Sites appears.
-    If one checks Restricts Sites by Url then only pages on those sites and
-    domains listed in the Allowed To Crawl Sites textarea can be crawled.
+    If one checks Restricts Sites by Url then only pages on those sites and
+    domains listed in the Allowed To Crawl Sites textarea can be crawled.
     We will say how to specify domains and sites in a moment, first let's
     discuss the last two textareas on the Options form. The Disallowed sites
-    textarea allows you to specify sites that you do not want the crawler
-    to crawl under any circumstance. There are many reasons you might not want
-    a crawler to crawl a site. For instance, some sites might not have a
-    good robots.txt file, but will ban you from interacting with their site
+    textarea allows you to specify sites that you do not want the crawler
+    to crawl under any circumstance. There are many reasons you might not want
+    a crawler to crawl a site. For instance, some sites might not have a
+    good robots.txt file, but will ban you from interacting with their site
     if they get too much traffic from you. The Seed sites textarea allows
     you to specify a list of urls that the crawl should start from. The
     crawl will begin using these urls.
@@ -1557,7 +1585,7 @@ php fetcher.php start 5
     and title/descriptions)  and in
     the Disallowed Sites/Sites with Quotas one can give a url
     followed by #. Otherwise,
-    in this common format, there should be one site, url, or domain per
+    in this common format, there should be one site, url, or domain per
     line. You should not separate sites and domains with commas or other
     punctuation. White space is ignored. A domain can be specified as:
     </p>
@@ -1573,45 +1601,45 @@ php fetcher.php start 5
     <p>would all fall under this domain. A site can be specified
     as scheme://domain/path. For example, https://www.somewhere.com/foo/ .
     Such a site includes https://www.somewhere.com/foo/anything_more .
-    Yioop! also recognizes * and $ within urls. So http://my.site.com/*/*/
-    would match http://my.site.com/subdir1/subdir2/rest and
+    Yioop also recognizes * and $ within urls. So http://my.site.com/*/*/
+    would match http://my.site.com/subdir1/subdir2/rest and
     http://my.site.com/*/*/$ would require the last symbol in the url
     to be '/'. This kind of pattern matching can be useful in the
     to restrict the depth of a crawl to
     within a url to a certain fixed depth -- you can allow crawling a site,
     but disallow the downloading of pages with more than a certain number of
     `/' in them.</p>
-    <p>In the Disallowed Sites/Sites with Quotas, a number after a # sign
+    <p>In the Disallowed Sites/Sites with Quotas, a number after a # sign
     indicates that at most that many
     pages should be downloaded from that site in any given hour. For example,
     </p>
     <pre>
     http://www.ucanbuyart.com/#100
     </pre>
-    <p>indicates that at most 100 pages are to be downloaded from
+    <p>indicates that at most 100 pages are to be downloaded from
     http://www.ucanbuyart.com/ per hour.</p>
     <p>In the seed site area one can specify title and page descriptions
-    for pages that Yioop! would otherwise be forbidden to crawl by the
+    for pages that Yioop would otherwise be forbidden to crawl by the
     robots.txt file. For example,</p>
     <pre>
 http://www.facebook.com/###!Facebook###!A%20famous%20social%20media%20site
     </pre>
-    <p>tells Yioop! to generate a placeholder page for
+    <p>tells Yioop to generate a placeholder page for
     http://www.facebook.com/ with title "Facebook" and description
-    "A famous social media site" rather than to attempt to download
+    "A famous social media site" rather than to attempt to download
     the page. The <a href="#editor">Results Editor</a> activity can only
-    be used to affect pages which are in a Yioop! index. This technique
+    be used to affect pages which are in a Yioop index. This technique
     allows one to add arbitrary pages to the index.</p>
-    <p>When configuring a new instance of Yioop! the file default_crawl.ini
+    <p>When configuring a new instance of Yioop the file default_crawl.ini
     is copied to WORK_DIRECTORY/crawl.ini and contains the initial settings
     for the Options form. </p>
     <p>The next part of the Edit Crawl Options form allows you to create
-    user-defined "meta-words". In Yioop! terminology, a meta-word is a word
-    which wasn't in a downloaded document, but which is added to the
-    inverted-index as if it had been in the document. The addition of
+    user-defined "meta-words". In Yioop terminology, a meta-word is a word
+    which wasn't in a downloaded document, but which is added to the
+    inverted-index as if it had been in the document. The addition of
     user-defined meta-words is specified by giving a pattern matching rule
-    based on the url. Unlike the sites field, for these fields we allow more
-    general regular expressions .For instance, in the figure above, the word
+    based on the url. Unlike the sites field, for these fields we allow more
+    general regular expressions .For instance, in the figure above, the word
     column has buyart and the url pattern column has:
     <pre>
     http://www.ucanbuyart.com/(.+)/(.+)/(.+)/(.+)/
@@ -1627,30 +1655,42 @@ http://www.facebook.com/###!Facebook###!A%20famous%20social%20media%20site
     </p>
     <p>The last part of the Edit Crawl Options form allows you to select which
     indexing plugins you would like to use during the crawl. For instance,
-    clicking the RecipePlugin checkbox would cause Yioop! to run the code
+    clicking the RecipePlugin checkbox would cause Yioop to run the code
     in indexing_plugins/recipe_plugin.php. This code tries to detect pages
     which are food recipes and separately extracts these recipes and clusters
     them by ingredient. The extract recipe pages is done by the pageProcessing
     callback in the RecipePlugin class of recipe_plugin.php; the clustering
     is done in RecipePlugin's postProcessing method. The first method is
-    called by Yioop! for each active plugin on each page downloaded. The second
-    method is called during the stop crawl process of Yioop!
+    called by Yioop for each active plugin on each page downloaded. The second
+    method is called during the stop crawl process of Yioop
     </p>
-    <h4>Archive Crawl Options</h4>
+    <h4 id="archive-crawl">Archive Crawl Options</h4>
     <p>We now consider how to do crawls of previously obtained archives.
     From the initial crawl options screen clicking on the Archive Crawl
     tab gives one the following form:</p>
 <img src='resources/ArchiveCrawlOptions.png' alt='Archive Crawl Options Form'/>
     <p>The drop down lists all previously done crawls that are available for
-    recrawl. These include both previously done Yioop crawls and crawls
+    recrawl.</p>
+<img src='resources/ArchiveCrawlDropDown.png' alt='Archive Crawl Drop Down'/>
+    </p>These include both previously done Yioop crawls, previously
+    down recrawls (prefixed with RECRAWL::), Yioop Crawl Mixes (prefixed with
+    MIX::), and crawls
     of other file formats such as arc, MediaWiki XML, and ODP RDF which
-    have been appropriately prepared in the PROFILE_DIR/cache folder.
-    You might want to re-crawl an existing Yioop! crawl if you want to add
+    have been appropriately prepared in the PROFILE_DIR/cache folder
+    (prefixed with ARCFILE::).
+    You might want to re-crawl an existing Yioop crawl if you want to add
     new meta-words or if you are migrating a crawl from an older version
-    of Yioop! for which the index isn't readable by your newer version of
-    Yioop! (You can even re-recrawl if you want).
+    of Yioop for which the index isn't readable by your newer version of
+    Yioop. For similar reasons, you
+    might want to recrawl a previously re-crawled crawl. When you
+    archive crawl a crawl mix, Yioop does a search on the keyword
+    <tt>site:any</tt> using the crawl mix in question. The results are then
+    indexed into a new archive. This new archive might have considerably
+    better query performance (in terms of speed) as compared to queries
+    performed on the original crawl mix. How to make a crawl mix is
+    described in the <a href="#mixes">Crawl Mixes</a> section.
     You might want to do an archive crawl of other file formats
-    if you want Yioop! to be able to provide search results of their content.
+    if you want Yioop to be able to provide search results of their content.
     Once you have selected the archive you want to crawl, you can add meta
     words as discussed in the previous section and then save your options
     and go back to the Create Crawl screen to start your crawl. As with
@@ -1660,9 +1700,9 @@ http://www.facebook.com/###!Facebook###!A%20famous%20social%20media%20site
     that was used in the creation process should be running.</p>
     <p>To get Yioop to detect arc, MediaWiki, and ODP RDF files you need
     to create an PROFILE_DIR/cache/archives folder on the name
-    server machine. Yioop! checks subfolders of this for
+    server machine. Yioop checks subfolders of this for
     files with the name arc_description.ini. For example, to do a Wikimedia
-    archive crawl, one could make a subfolder
+    archive crawl, one could make a subfolder
     PROFILE_DIR/cache/archives/my_wiki_media_files and put in it a
     file arc_description.ini in the format to be discussed in a moment.
     The arc_description.ini file's contents are used to give a description
@@ -1679,7 +1719,7 @@ description = 'English Wikipedia 2012';
     file in Internet Archive arc format one can use:</p>
     <pre>
 ArcArchiveBundle
-    </pre>
+    </pre>
     <p>For Media Wiki xml, one uses the arc_type:</p>
     <pre>
 MediaWikiArchiveBundle
@@ -1688,49 +1728,55 @@ MediaWikiArchiveBundle
     <pre>
 OdpRdfArchiveBundle
     </pre>
-    <p>In addition, to the arc_description.ini file, remember that the subfolder
-    should also contain instances of the files in question that you would like
-    to archive crawl. So for arc files, these would be files of extension
-    .arc.gz; for MediaWiki, files of extension .xml.bz2;
+    <p>In addition, to the arc_description.ini file, remember that the subfolder
+    should also contain instances of the files in question that you would like
+    to archive crawl. So for arc files, these would be files of extension
+    .arc.gz; for MediaWiki, files of extension .xml.bz2;
     and for ODP-RDF, files of extension .rdf.u8.gz .
     </p>

     <p><a href="#toc">Return to table of contents</a>.</p>
-
+
     <h2 id='mixes'>Mixing Crawl Indexes</h2>
-    <p>Once you have performed a few crawls with Yioop!, you can use the Mix
-    Crawls activity to create mixture of your crawls. The main Mix Crawls
+    <p>Once you have performed a few crawls with Yioop, you can use the Mix
+    Crawls activity to create mixture of your crawls.
+    This section describes how to create crawl mixes which are processed
+    when a query comes in to Yioop. Once one has created such a crawl
+    mix, one can make a new index which consists of results of the
+    crawl mix ("materialize it") by doing an archive crawl of the crawl mix.
+    The <a href="#archive-crawl">Archive Crawl Options</a> subsection has more
+    details on how to do this latter operation. The main Mix Crawls
     activity looks like:</p>
     <img src='resources/ManageMixes.png' alt='The Manage Mixes form'/>
     <p>The first form allows you to name and create a new crawl mixture.
-    Clicking "Create" sends you to a second page where you can provide
-    information about how the mixture should be built. Beneath the Create mix
+    Clicking "Create" sends you to a second page where you can provide
+    information about how the mixture should be built. Beneath the Create mix
     form is a table listing all the previously created crawl mixes. The
-    first column has the name of the mix, the second column says how the
+    first column has the name of the mix, the second column says how the
     mix is built out of component crawls, and the actions columns allows you
-    to edit the mix, set it as the default index for Yioop! search results, or
-    delete the mix. You can also append "m:name+of+mix" or "mix:name+of+mix"
-    to a query to use that quiz without having to set it as the index.
-    When you create a new mix it also shows up on the Settings
-    page. Creating a new mix or editing an existing mix sends you to a second
+    to edit the mix, set it as the default index for Yioop search results, or
+    delete the mix. You can also append "m:name+of+mix" or "mix:name+of+mix"
+    to a query to use that quiz without having to set it as the index.
+    When you create a new mix it also shows up on the Settings
+    page. Creating a new mix or editing an existing mix sends you to a second
     page:</p>
     <img src='resources/EditMix.png' alt='The Edit Mixes form'/>
     <p>Using the "Back" link on this page will take you to the prior screen.
     The first text field on the edit page lets you rename your mix if you so
     desire. Beneath this is an "Add Groups" button. A group is a weighted
     list of crawls. If only one group were present, then search results would
-    come from any crawl listed for this group. A given result's score
-    would be the weighted sum of the scores of the crawls in the group it
-    appears in. Search results  are displayed in descending order according to
-    this total score. If more that one group is present then the number of
-    results field for that group determines how many of the displayed results
+    come from any crawl listed for this group. A given result's score
+    would be the weighted sum of the scores of the crawls in the group it
+    appears in. Search results  are displayed in descending order according to
+    this total score. If more that one group is present then the number of
+    results field for that group determines how many of the displayed results
     should come from that group.
     For the Crawl Mix displayed above, there are three groups: The first group
     is used to display the first result, the second group is used to display
     the second result, the last group is used to display any remaining search
     results.</p>
     <p>The UI for groups works as follows: The top row has three columns.
-    To add new components to a group use the drop-down in the first column.
+    To add new components to a group use the drop-down in the first column.
     The second column controls for how many results
     the particular crawl group should be used. Different groups results are
     presented in the order they appear in the crawl mix. The last group is
@@ -1746,9 +1792,9 @@ OdpRdfArchiveBundle
     get whatever results from this crawl that consisted of text rather than
     image pages. Keywords can be used to make a particulat component of
     a crawl mix behave in a conditional many by using the "if:" meta word
-    described in the search and user interface section. The last link in a
-    crawl row allows you to delete a crawl from a crawl group. For changes on
-    this page to take effect, the "Save" button beneath this drop-down must
+    described in the search and user interface section. The last link in a
+    crawl row allows you to delete a crawl from a crawl group. For changes on
+    this page to take effect, the "Save" button beneath this drop-down must
     be clicked.
     </p>
     <p><a href="#toc">Return to table of contents</a>.</p>
@@ -1760,24 +1806,24 @@ OdpRdfArchiveBundle
     any given web page should be downloaded. Smaller numbers reduce the
     requirements on disk space needed for a crawl; bigger numbers would
     tend to improve the search results. The next drop-down,
-    Allow Page Recrawl After, controls how many days that Yioop! keeps
+    Allow Page Recrawl After, controls how many days that Yioop keeps
     track of all the URLs that it has downloaded from. For instance, if one
-    sets this drop-down to 7, then after seven days Yioop! will clear its
-    Bloom Filter files used to store which urls have been downloaded, and it
+    sets this drop-down to 7, then after seven days Yioop will clear its
+    Bloom Filter files used to store which urls have been downloaded, and it
     would be allowed to recrawl these urls again if they happened in links. It
     should be noted that all of the information from before the seven
-    days will still be in the index, just that now Yioop! will be able to
-    recrawl pages that it had previously crawled. Besides letting Yioop!
+    days will still be in the index, just that now Yioop will be able to
+    recrawl pages that it had previously crawled. Besides letting Yioop
     get a fresher version of page it already has, this also has the benefit
-    of speeding up longer crawls as Yioop! doesn't need to check as many
+    of speeding up longer crawls as Yioop doesn't need to check as many
     Bloom filter files. In particular, it might just use one and keep it in
     memory. The Page File Types to Crawl checkboxes allow you to decide
     which file extensions you want Yioop to download during a crawl. Finally,
     the Title Weight, Description Weight, Link Weight field are used by
-    Yioop! to decide how to weight each portion of a document when it returns
-    query results to you. The Save button of course saves any changes you
+    Yioop to decide how to weight each portion of a document when it returns
+    query results to you. The Save button of course saves any changes you
     make on this form.</p>
-    <p>It should be pointed out that the settings on this form (except the
+    <p>It should be pointed out that the settings on this form (except the
     weight fields) only affect future crawls -- they do not affect
     any crawls that have already occurred or are on going.</p>
     <h2 id='editor'>Results Editor</h2>
@@ -1787,27 +1833,27 @@ OdpRdfArchiveBundle
     one to fix these issues without having to do a completely new crawl.
     It has three main forms: An edited urls forms, a url editing form,
     and a filter websites form.</p>
-    <p>If one has already edited the summary for
+    <p>If one has already edited the summary for
     a url, then the drop-down in the edited urls form will list this url. One
     can select it and click load to get it to display in the url editing
     form. The purpose of the url editing form is to allow a user to change
     the title and description for a url that appears on a search results
     page. By filling out the three fields of the
     url editing form, or by loading values into them through the previous form
-    and changing them, and then clicking save, updates the appearance of the
+    and changing them, and then clicking save, updates the appearance of the
     summary for that url. To return to using the default summary, one only fills
     out the url field, leaves the other two blank, and saves.
     This form does not affect whether the page is looked up for a given query,
     only its final appearance. It can only be used to edit the appearance
     of pages which appear in the index, not to add pages to the index. Also,
     the edit will affect the appearance of that page for all indexes managed
-    by Yioop! If you know there is a page that won't be crawled by
-    Yioop!, but would like it to appear in an index, please look at the crawl
+    by Yioop If you know there is a page that won't be crawled by
+    Yioop, but would like it to appear in an index, please look at the crawl
     options section of <a href="#crawls">Manage Crawls</a> documentation.
     </p>
-    <p>To understand the filter websites form, recall the disallowed sites
+    <p>To understand the filter websites form, recall the disallowed sites
     crawl option allows a user to specify they
-    don't want Yioop! to crawl a given web site. After a crawl is done
+    don't want Yioop to crawl a given web site. After a crawl is done
     though one might be asked to removed a website from the crawl results,
     or one might want to remove a website from the crawl results because it
     has questionable content. A large crawl can take days to replace, to
@@ -1815,10 +1861,10 @@ OdpRdfArchiveBundle
     a replacement crawl where the site has been disallowed, one can use
     a search filter.</p>
 <img src='resources/ResultsEditor.png' alt='The Results Editor form'/>
-    <p>Using the filter websites form one can specify a list of hosts which
+    <p>Using the filter websites form one can specify a list of hosts which
     should be excluded from the search results. The sites listed in the
     Sites to Filter text area are required to be hostnames. Using
-    a filter, any web page with the same host name as one listed in
+    a filter, any web page with the same host name as one listed in
     the Sites to Filter will not appear in the search results. So for example,
     the filter settings in the example image above contain the line
     http://www.cs.sjsu.edu/, so given these settings, the web page
@@ -1827,25 +1873,25 @@ OdpRdfArchiveBundle
     <p><a href="#toc">Return to table of contents</a>.</p>
     <h2 id='sources'>Search Sources</h2>
     <p>The Search Sources activity is used to manage the media sources
-    available to Yioop!, and also to control the subsearch links displayed
+    available to Yioop, and also to control the subsearch links displayed
     on the top navigation bar. The Search Sources activity looks like:</p>
 <img src='resources/SearchSources.png' alt='The Search Sources form'/>
-    <p>The top form is used to add a media source to Yioop! Currently,
+    <p>The top form is used to add a media source to Yioop. Currently,
     the Media Kind can be either Video or RSS. Video Media sources
-    are used to help Yioop! recognize links which are of videos on
+    are used to help Yioop recognize links which are of videos on
     a web video site such as YouTube. This helps in both tagging
     such pages with the meta word media:video in a Yioop index, and
     in being able to render a thumbnail of the video in the search results.
     When the media kind is set to video, this form has three fields:
     Name, which should be a short familiar name for the video site (for example,
-    YouTube); URL, which should consist of a url pattern by which to
+    YouTube); URL, which should consist of a url pattern by which to
     recognize a video on that site; and Thumb, which consist of a url pattern
     to replace the original pattern by to find the thumbnail for that video.
     For example, the value of URL for YouTube is:<p>
     <pre>
     http://www.youtube.com/watch?v={}&
     </pre>
-    <p>This will match any url which begins with
+    <p>This will match any url which begins with
     http://www.youtube.com/watch?v= followed by some string followed by
     &amp; followed by another string. The {} indicates that from
     v= to the &amp; should be treated as the identifier for the video. The
@@ -1855,7 +1901,7 @@ OdpRdfArchiveBundle
     http://img.youtube.com/vi/{}/2.jpg
     </pre>
     <p>If the identifier in the first video link was yv0zA9kN6L8, then
-    using the above, when displaying a thumb for the video, Yioop!
+    using the above, when displaying a thumb for the video, Yioop
     would use the image source:</p>
     <pre>
     http://img.youtube.com/vi/{yv0zA9kN6L8}/2.jpg
@@ -1873,12 +1919,12 @@ OdpRdfArchiveBundle
     RSS feed; URL, the url of the RSS feed, and Language, what language
     the RSS feed is. This last element is used to control whether or
     not a news item will display given the current language settings of
-    Yioop! If under the Configure activity, the subsearch checkbox
-    is checked so that subsearches are displayed, then Yioop! will
+    Yioop. If under the Configure activity, the subsearch checkbox
+    is checked so that subsearches are displayed, then Yioop will
     try to download its list of RSS feeds hourly. This does not need
     a queue_server or a fetcher running, and is accomplished by making
-    a curl request from the web app to the sites in question on the
-    first search performed on Yioop! after an hour has elapsed between
+    a curl request from the web app to the sites in question on the
+    first search performed on Yioop after an hour has elapsed between
     the last RSS download.</p>
     <p>Beneath this top form is a table listing all the currently
     added media sources, their urls, and a link that allows one to delete
@@ -1901,7 +1947,7 @@ OdpRdfArchiveBundle
     subsearches and their properties. The actions column at the end of this
     table let's one either localize or delete a given subsearch. Clicking
     localize takes one to the Manage Locale's page for the default locale
-    and that parituclar subsearch localization identifier, so that you can
+    and that parituclar subsearch localization identifier, so that you can
     fill in a value for it. Remembering the name of this identifier,
     one can then in Manage Locales navigate to other locales, and fill
     in translations for them as well, if desired.</p>
@@ -1911,40 +1957,40 @@ OdpRdfArchiveBundle
     <a href="#prereqs">Prerequisites for Crawling</a> section, it is possible
     to start/stop and view the log files of queue servers and fetcher
     through the Manage Machines activity. In order to do this, the additional
-    requirements for this activity mentioned in the
+    requirements for this activity mentioned in the
     <a href="#requirements">Requirements</a> section must have been met.
     The Manage Machines activity looks like:</p>
 <img src='resources/ManageMachines.png' alt='The Manage Machines form'/>
     <p>The Add machine form at the top of the page allows one to add a new
-    machine to be controlled by this Yioop! instance. The Machine
+    machine to be controlled by this Yioop instance. The Machine
     Name field let's you give this machine an easy to remember name
-    The Machine URL field should be filled in with the URL to the
-    installed Yioop! instance. The is Mirror checkbox says whether you want
-    the given Yioop! installation to act as a mirror for another Yioop!
+    The Machine URL field should be filled in with the URL to the
+    installed Yioop instance. The is Mirror checkbox says whether you want
+    the given Yioop installation to act as a mirror for another Yioop
     installation. Checking it will reveal a drop-down menu that allows you
     to choose which installation amongst the previously entered machines
     you want to mirror. The Has Queue Server checkbox is used to say whether
-    the given Yioop! installation will be running a queue server or not.
-    Finally, the  Number of Fetchers drop down allows you to say how many
-    fetcher instances you want to be able to manage for that machine.
-    The Delete Machine form allows you to remove a machine that you either
-    misconfigured  or that you no longer want to manage through this Yioop!
-    instance. To modify a machine that you have already added, you should
+    the given Yioop installation will be running a queue server or not.
+    Finally, the  Number of Fetchers drop down allows you to say how many
+    fetcher instances you want to be able to manage for that machine.
+    The Delete Machine form allows you to remove a machine that you either
+    misconfigured  or that you no longer want to manage through this Yioop
+    instance. To modify a machine that you have already added, you should
     delete it and re-add it using the setting you want. The Machine Information
     section of the Manage Machines activity consists of boxes for
     each machine that you have added. Each box lists the queue server,
     if any, and each of the fetchers you requested to be able to manage.
     Next to these there is a link to the log file for that server/fetcher
-    and below this there is an On/Off switch for starting and stopping
+    and below this there is an On/Off switch for starting and stopping
     the server/fetcher. This switch is green if the server/fetcher is running
     and red otherwise. A similar On/Off switch is present to turn on
     and off mirroring on a machine that is acting as a mirror.</p>
-    <h2 id='localizing'>Localizing Yioop! to a New Language</h2>
-    <p>The Manage Locales activity can be used to configure Yioop!
+    <h2 id='localizing'>Localizing Yioop to a New Language</h2>
+    <p>The Manage Locales activity can be used to configure Yioop
     for use with different languages and for different regions. If you decide
-    to customize your Yioop! installation by adding files to
-    WORK_DIRECTORY/app as described in the <a href="framework">Building a
-    Site using Yioop! as a Framework</a> section, then the localization
+    to customize your Yioop installation by adding files to
+    WORK_DIRECTORY/app as described in the <a href="framework">Building a
+    Site using Yioop as a Framework</a> section, then the localization
     tools described in this section can also be used to localize your custom
     site. Clicking the Manage Locales activity one sees a page like:</p>
 <img src='resources/ManagingLocales.png' alt='The Manage Locales form'/>
@@ -1962,8 +2008,8 @@ OdpRdfArchiveBundle
     of the page to the bottom from right-to-left as in Classical Chinese, and
     finally, tb-lr from the top of the page to the bottom from left-to-right
     as in non-cyrillic Mongolian. lr-tb and rl-tb support work better
-    than the vertical language support. As of this writing, only
-    Internet Explorer has some vertical language support and the Yioop!
+    than the vertical language support. As of this writing, only
+    Internet Explorer has some vertical language support and the Yioop
     stylesheets for vertical languages still need some tweaking.
     </p>
     <p>The second form for this activity allows one to delete an existing
@@ -1979,79 +2025,79 @@ OdpRdfArchiveBundle
     in the corner can be used to written to the previous form.
     The Static Pages download has a list of all the static pages (.thtml files)
     which are in either the folder WORK_DIRECTORY/locale/current-tag/pages
-    (in this case, current-tag is en-US) or the folder
+    (in this case, current-tag is en-US) or the folder
     WORK_DIRECTORY/locale/default-tag/pages where default-tag is the IANA tag
-    for the default language of the Yioop! installation. Selecting a page
-    allows one to edit it within Yioop!. The idea is that one might have
+    for the default language of the Yioop installation. Selecting a page
+    allows one to edit it within Yioop. The idea is that one might have
     a couple of static pages you have created in the default locale pages folder
-    and a localizer can use this interface to see what is written in these
-    files. Yioop! autmatically creates these files in the directory the
+    and a localizer can use this interface to see what is written in these
+    files. Yioop automatically creates these files in the directory the
     localizer is localizing for, and the localizer can translate their contents
     into the appropriate language. Beneath this drop-down, the
     Edit Locale page mainly consists of a two column table: the right column
     being string ids, the left column containing what should be their
     translation into the given locale. If no translation exists yet,
-    the field will be displayed in red. String ids are extracted by Yioop!
+    the field will be displayed in red. String ids are extracted by Yioop
     automatically from controller, view, helper, layout, and element class files
-    which are either in the Yioop! Installation itself or in the installation
-    WORK_DIRECTORY/app folder. Yioop! looks for tl() function calls to extract
+    which are either in the Yioop Installation itself or in the installation
+    WORK_DIRECTORY/app folder. Yioop looks for tl() function calls to extract
     ids from these files, for example, on seeing tl('search_view_query_results')
-    Yioop! would extract the id search_view_query_results; on seeing
-    tl('search_view_calculated', $data['ELAPSED_TIME']) Yioop! would extract
+    Yioop would extract the id search_view_query_results; on seeing
+    tl('search_view_calculated', $data['ELAPSED_TIME']) Yioop would extract
     the id, 'search_view_calculated'. In the second case, the translation is
     expected the translation to have a %s in it for the value of
     $data['ELAPSED_TIME']. Note %s is used regardless of the the type, say
-    int, float, string, etc., of $data['ELAPSED_TIME']. tl() can handle
+    int, float, string, etc., of $data['ELAPSED_TIME']. tl() can handle
     additional arguments, whenever an additional argument is supplied an
     additional %s would be expected somewhere in the translation string.
     If you make a set of translations, be sure to submit the form associated
-    with this table by scrolling to the bottom of the page and clicking the
-    Submit link. This saves your translations; otherwise, your work will be
+    with this table by scrolling to the bottom of the page and clicking the
+    Submit link. This saves your translations; otherwise, your work will be
     lost if you navigate away from this page. One aid to translating is if you
     hover your mouse over a field that needs translation, then its translation
     in the default locale (usually English) is displayed. If you want to find
     where in the source code a string id comes from the ids follow
     the rough convention file_name_approximate_english_translation.
-    So you would expect to find admin_controller_login_successful
+    So you would expect to find admin_controller_login_successful
     in the file controllers/admin_controller.php . String ids with the
     prefix db_ (such as the names of activities) are stored in the database.
-    So you cannot find these ids in the source code. The tooltip trick
+    So you cannot find these ids in the source code. The tooltip trick
     mentioned above does not work for database string ids.</p>

-    <h3>Adding a stemmer or supporting character
+    <h3>Adding a stemmer or supporting character
     n-gramming for your language</h3>
     <p>Depending on the language you are localizing to, it may make sense
     to write a stemmer for words that will be inserted into the index.
     A stemmer takes inflected or sometimes derived words and reduces
     them to their stem. For instance, jumps and jumping would be reduced to
-    jump in English. As Yioop! crawls it attempts to detect the language of
+    jump in English. As Yioop crawls it attempts to detect the language of
     a given web page it is processing. If a stemmer exists for this language
-    it will call the stemmer's stem($word) method on each word it extracts
+    it will call the stemmer's stem($word) method on each word it extracts
     from the document before inserting information about it into the index.
-    Similarly, if an end-user is entering a simple conjunctive search query
+    Similarly, if an end-user is entering a simple conjunctive search query
     and a stemmer exists for his language settings, then the query terms will
-    be stemmed before being looked up in the index. Currently, Yioop! comes
+    be stemmed before being looked up in the index. Currently, Yioop comes
     with only an English language stemmer that uses the Porter Stemming
     Algorithm [<a href="#P1980">P1980</a>]. This stemmer is located in the
-    file WORK_DIRECTORY/locale/en-US/resources/tokenizer.php .
-    The [<a href="#P1980">P1980</a>] link
+    file WORK_DIRECTORY/locale/en-US/resources/tokenizer.php .
+    The [<a href="#P1980">P1980</a>] link
     points to a site that has source code for stemmers for many other languages
-    (unfortunately,  not written in PHP). It would not be hard to port these
+    (unfortunately,  not written in PHP). It would not be hard to port these
     to PHP and then add modify the tokenizer.php file of the
-    appropriate locale folder. For instance, one
-    could modify the file
-    WORK_DIRECTORY/locale/fr-FR/resources/tokenizer.php
-    to contain a class FrStemmer with method
+    appropriate locale folder. For instance, one
+    could modify the file
+    WORK_DIRECTORY/locale/fr-FR/resources/tokenizer.php
+    to contain a class FrStemmer with method
     stem($word) if one wanted to add a stemmer for French.
     </p>
     <p>In addition to supporting the ability to add stemmers, Yioop also
     supports a default technique which can be used in lieu of a stemmer
     called character n-grams. When used this technique segments text into
-    sequences of n characters which are then stored in Yioop! as a term.
+    sequences of n characters which are then stored in Yioop as a term.
     For instance if n were 3 then the word "thunder" would be split
     into "thu", "hun", "und", "nde", and "der" and each of these would be
-    asscociated with the document that contained the word thunder.
-    N-grams are useful for languages like Chinese and Japanese in which
+    asscociated with the document that contained the word thunder.
+    N-grams are useful for languages like Chinese and Japanese in which
     words in the text are often not separated with spaces. It is also
     useful for languages like German which can have long compound words.
     The drawback of n-grams is that they tend to make the index larger.
@@ -2059,15 +2105,15 @@ OdpRdfArchiveBundle
     WORK_DIRECTORY/locale/LOCALE-TAG/resources/tokenizer.php has a line
     of the form $CHARGRAMS['LOCALE_TAG'] = SOME_NUMBER; This number is
     the length of string to use in doing char-gramming. If you add a
-    language to Yioop! and want to use char gramming merely add a tokenizer.php
+    language to Yioop and want to use char gramming merely add a tokenizer.php
     to the corresponding locale folder with such a line in it.</p>
     <h3>Using token_tool.php to improve search performance and relevance
     for your language</h3>
-    <p>configs/token_tool is used to create suggest word dictionaries and 'n'
-    word gram filter files for the Yioop! search engine. To create either of
-    these items, the user puts a source file in Yioop's WORK_DIRECTORY/prepare
-    folder. Suggest word dictionaries are used to supply the content of the
-    dropdown of search terms that appears as a user is entering a query in
+    <p>configs/token_tool is used to create suggest word dictionaries and 'n'
+    word gram filter files for the Yioop search engine. To create either of
+    these items, the user puts a source file in Yioop's WORK_DIRECTORY/prepare
+    folder. Suggest word dictionaries are used to supply the content of the
+    dropdown of search terms that appears as a user is entering a query in
     Yioop. To make a suggest dictionary one can use a command like:</p>
     <pre>
     php token_tool.php dictionary filename locale endmarker
@@ -2076,26 +2122,26 @@ OdpRdfArchiveBundle
     Here filename should be in the current folder or PREP_DIR and should consist
     of one word per line, locale is the locale this suggest (for example, en-US)
     file is being made for and where a file suggest-trie.txt.gz will be written,
-    and endmarker is the end of word symbol to use in the trie. For example,
-    $ works pretty well.
+    and endmarker is the end of word symbol to use in the trie. For example,
+    $ works pretty well.
     </p>
     <p>
-    token_tool.php can also be used to make filter files. A filter file is used
-    to detect when words in a language should be treated as a unit when
-    extracting text during a crawl. For example, Bill Clinton is 2 word gram
-    which should be treated as unit because it is a particular person.
+    token_tool.php can also be used to make filter files. A filter file is used
+    to detect when words in a language should be treated as a unit when
+    extracting text during a crawl. For example, Bill Clinton is 2 word gram
+    which should be treated as unit because it is a particular person.
     token_tool.php is run from the command line as:
     </p>
     <pre>
     php token_tool.php filter wiki_file lang locale n extract_type <?php
-    ?>max_to_extract
+    ?>max_to_extract
     </pre>
     <p>
-    where wiki_file is a wikipedia xml file or a bz2  compressed xml file whose
-    urls or wiki page count dump file which will be used to determine the
-    n-grams, lang is an Wikipedia language tag,  locale is the IANA language
+    where wiki_file is a wikipedia xml file or a bz2  compressed xml file whose
+    urls or wiki page count dump file which will be used to determine the
+    n-grams, lang is an Wikipedia language tag,  locale is the IANA language
     tag of locale to store the results for (if different from lang, for example,
-    en-US versus en for  lang), n is the number of words in a row to consider,
+    en-US versus en for  lang), n is the number of words in a row to consider,
     extract_type is where from Wikipedia source to extract:
     </p>
     <pre>
@@ -2107,7 +2153,7 @@ OdpRdfArchiveBundle

     <h3>Obtaining data sets for token_tool.php</h3>
     <p>
-    Many word lists are obtainable on the web for free with Creative Commons
+    Many word lists are obtainable on the web for free with Creative Commons
     licenses. A good starting point is:</p>
     <pre>
     <a href="http://en.wiktionary.org/wiki/Wiktionary:Frequency_lists"
@@ -2122,7 +2168,7 @@ OdpRdfArchiveBundle
     >http://dumps.wikimedia.org/other/pagecounts-raw/</a>
     </pre>
     <p>These probably give the best n-gram or all gram results, usually
-    in a matter of minutes; nevertheless, this tool does support trying to
+    in a matter of minutes; nevertheless, this tool does support trying to
     extract  similar data from Wikipedia dumps. This can take hours.</p>
     <p>For Wikipedia dumps, one can go to</p>
     <pre>
@@ -2130,12 +2176,12 @@ OdpRdfArchiveBundle
     >http://dumps.wikimedia.org/enwiki/</a>
     </pre>
     <p>
-    and obtain a dump of the English Wikipedia (similar for other languages).
-    This page lists all the dumps according to date they were taken. Choose any
-    suitable date or the latest. A link with a label such as 20120104/,
-    represents a dump taken on  01/04/2012. Click this link to go in turn to a
+    and obtain a dump of the English Wikipedia (similar for other languages).
+    This page lists all the dumps according to date they were taken. Choose any
+    suitable date or the latest. A link with a label such as 20120104/,
+    represents a dump taken on  01/04/2012. Click this link to go in turn to a
     page which has many links based on type of content you are looking for. For
-    this tool you are interested in files under "Recombine all pages, current
+    this tool you are interested in files under "Recombine all pages, current
     versions only".</p>
     <p>
     Beneath this we might find a link with a name like:</p>
@@ -2146,23 +2192,23 @@ OdpRdfArchiveBundle
     which is a file that could be processed by this tool.
     </p>
     <p><a href="#toc">Return to table of contents</a>.</p>
-    <h2 id='framework'>Building a Site using Yioop! as Framework</h2>
-    <p>The Yioop! code base can serve as the code base for new custom search
-    web sites. The web-app portion of Yioop! uses a model-view-controller (MVC)
-    framework. In this set-up, sub-classes of the Model class should handle
-    file I/O and database function, sub-classes of Views should be responsible
-    for rendering outputs, and sub-classes of the Controller class
+    <h2 id='framework'>Building a Site using Yioop as Framework</h2>
+    <p>The Yioop code base can serve as the code base for new custom search
+    web sites. The web-app portion of Yioop uses a model-view-controller (MVC)
+    framework. In this set-up, sub-classes of the Model class should handle
+    file I/O and database function, sub-classes of Views should be responsible
+    for rendering outputs, and sub-classes of the Controller class
     do calculations on data received from the web and from the models to give
     the views the data they finally need to render. In the remainder of this
-    section we describe how this framework is implemented in Yioop! and
+    section we describe how this framework is implemented in Yioop and
     how to add code to the WORK_DIRECTORY/app folder to customize things for
-    your site. In this discussion we will use APP_DIR to refer to
-    WORK_DIRECTORY/app and BASE_DIR to refer to the directory where Yioop!
+    your site. In this discussion we will use APP_DIR to refer to
+    WORK_DIRECTORY/app and BASE_DIR to refer to the directory where Yioop
     is installed.</p>

-    <p>The index.php script is the first script run by the Yioop! web app.
+    <p>The index.php script is the first script run by the Yioop web app.
     It has an array $available_controllers which lists the controllers
-    available to the script. The names of the controllers in this array are
+    available to the script. The names of the controllers in this array are
     lower case. Based on whether the $_REQUEST['c'] variable is in this array
     index.php either  loads the file {$_REQUEST['c']}_controller.php or loads
     whatever the default controller is. index.php also checks for the existing
@@ -2172,19 +2218,19 @@ OdpRdfArchiveBundle
     a file which extends the class Controller. Controller files should always
     have names of the form somename_controller.php and the class inside them
     should be named SomenameController. Notice it is Somename rather than
-    SomeName. These general naming conventions are used for models, views, etc.
-    Any Controller subclass has the fields $models, $views, and
+    SomeName. These general naming conventions are used for models, views, etc.
+    Any Controller subclass has the fields $models, $views, and
     $indexing_plugins. For the base class these are empty,
     but for a subclass you create you can set them to be arrays listing the
-    names of the models, views, and indexing_plugins your class uses. Yioop!
+    names of the models, views, and indexing_plugins your class uses. Yioop
     tries to load each of the classes listed in these arrays. For example
     if MyController defined:</p>
     <pre>
     var $view = array("search");
     </pre>
     <p>
-    Then Yioop! would first look for a file: APP_DIR/models/search_view.php
-    to include, if it cannot find such a file then it tries to include
+    Then Yioop would first look for a file: APP_DIR/models/search_view.php
+    to include, if it cannot find such a file then it tries to include
     BASE_DIR/models/search_view.php. So to change the behavior of an existing
     BASE_DIR file one just has a modified copy of the file in the appropriate
     place in your APP_DIR. This holds in general for other program files
@@ -2192,12 +2238,12 @@ OdpRdfArchiveBundle
     we'll discuss those in a moment. Notice because it looks in APP_DIR
     first, you can go ahead and create new controllers, models, views, etc
     which don't exists in BASE_DIR and by setting the variables up right get
-    Yioop! to load them. When an instance of the controller
-    class Yioop! is using for a request is created, Yioop! also creates
+    Yioop to load them. When an instance of the controller
+    class Yioop is using for a request is created, Yioop also creates
     an instance of each View, Model and IndexingPlugin associated with that
     controller and sets them as field variables. To refer to the instance o
     SearchView in an instance $mycontroller of MyController we could use the
-    variable $mycontroller-&gt;searchView. For models, we would write
+    variable $mycontroller-&gt;searchView. For models, we would write
     expressions like</p>
 <pre>
     $mycontroller-&gt;mymodelnameModel
@@ -2206,7 +2252,7 @@ OdpRdfArchiveBundle
 <pre>
     $mycontroller-&gt;mypluginnamePlugin
 </pre>
-    <p>Notice in each expression the name of the
+    <p>Notice in each expression the name of the
     particular model or plugin is lower case. Given this way of referring
     to models, a controller can invoke a models methods to get data out
     of the file system or from a database with expressions like:</p>
@@ -2214,37 +2260,37 @@ OdpRdfArchiveBundle
     $mycontroller-&gt;mymodelnameModel-&gt;someMethod();
 </pre>
     <p>
-    In the above, if the code was within a method in the controller class
+    In the above, if the code was within a method in the controller class
     itself, we would typically write things like:</p>
 <pre>
     $this-&gt;mymodelnameModel-&gt;someMethod();
 </pre>
     </p>
-    A Controller must implement the abstract method
+    A Controller must implement the abstract method
     processRequest. The index.php script after finishing its bootstrap process
-    calls the processRequest method of the Controller it chose to
-    load. If this was your controller, the code in your controller
+    calls the processRequest method of the Controller it chose to
+    load. If this was your controller, the code in your controller
     should make use of data gotten out of
-    the loaded models as well as data from the web request to do some
+    the loaded models as well as data from the web request to do some
     calculations. The results of these calculations you would typically
     put into an associative array $data and then call the base Controller method
     displayView($view, $data). Here $view is the whichever loaded view object
     you would like to display.
     </p>
     <p>
-    To complete the picture of how Yioop! eventually produces a web page or
+    To complete the picture of how Yioop eventually produces a web page or
     other output, we now describe how subclasses of the View class work.
-    Subclasses of View have four fields
+    Subclasses of View have four fields
     $pages, $layout, $helpers, and $elements. In the base class, $pages,
     $helpers, and $elements are empty arrays and the $layout is an empty
-    string. A subclass of View has at most one Layout and it is used
+    string. A subclass of View has at most one Layout and it is used
     for rendering the header and footer of the page. It is included and
-    instantiated by setting $layout to be the name of the layout one wants to
+    instantiated by setting $layout to be the name of the layout one wants to
     load. For example, $layout="web"; would load either the
-    file APP_DIR/views/layouts/web_layout.php or
+    file APP_DIR/views/layouts/web_layout.php or
     BASE_DIR/views/layouts/web_layout.php. This file is expected to have in it
     a class WebLayout extending Layout. The contructor of a Layout
-    take as argument a view which it sets to an instance variabe.
+    take as argument a view which it sets to an instance variabe.
     The way layouts get drawn is
     as follows: When the controller calls displayView($view, $data), this method
     does some initialization and then calls the render($data) of the base
@@ -2254,7 +2300,7 @@ OdpRdfArchiveBundle
     draws the footer.
     </p>
     <p>
-    The files loaded by the constructor of View for
+    The files loaded by the constructor of View for
     each of $pages, $helpers, and $elements follows the same kind of pattern
     as described above for Controller. The files loaded in the case of
     $helpers as expected to be sub-classes of Helper and those of $elements
@@ -2262,11 +2308,11 @@ OdpRdfArchiveBundle
     $view, with had $helpers = array("somehelper"); would get an instance
     variable $view-&gt;somehelperHelper and similarly, for elements. Each
     file loaded in because of the $pages array on the other is expected
-    to be a static portion of a web page in
+    to be a static portion of a web page in
     WORK_DIRECTORY/locale/current-IANA-tag/pages.
     For example, $pages=array("about"); would look for an about.thtml file
     in this folder, load it and assign the string contents
-    to $page_objects["about"]. So using Yioop!'s shorthand for echo. A view
+    to $page_objects["about"]. So using Yioop's shorthand for echo. A view
     could render this page with the command:</p>
     <pre>
     e($this->page_objects["about"]);
@@ -2278,14 +2324,14 @@ OdpRdfArchiveBundle
     has a render($id, $name, $options, $selected) method and is used to
     draw select drop-downs.
     </p>
-    <p>When rendering a View or Element one often has css, scripts, images,
+    <p>When rendering a View or Element one often has css, scripts, images,
     videos, objects, etc. In BASE_DIR, the targets of these tags would typically
-    be stored in the css, scripts, or resources folders.
+    be stored in the css, scripts, or resources folders.
     The APP_DIR/css, APP_DIR/scripts, and APP_DIR/resources folder are
     a natural place for them in your customized site. One wrinkle,
     however, is that APP_DIR, unlike BASE_DIR, doesn't have to be under
     your web servers DOCUMENT_ROOT. So how does one refer in a link
-    to these folders? To this one uses Yioop!'s ResourceController class
+    to these folders? To this one uses Yioop's ResourceController class
     which can be invoked by a link like:</p>
     <pre>
     &lt;img src="?c=resource&amp;a=get&amp;n=myicon.png&amp;f=resources" /&gt;
@@ -2294,36 +2340,36 @@ OdpRdfArchiveBundle
     Here c=resource specifies the controller, a=get specifies the activity --
     to get a file, n=myicon.png specifies we want the file myicon.png --
     the value of n is cleaned to make sure it is a filename before being used,
-    and f=resources specifies the folder -- f is allowed to be one of
-    css, script, or resources. This would get the file
+    and f=resources specifies the folder -- f is allowed to be one of
+    css, script, or resources. This would get the file
     APP_DIR/resources/myicon.png .
     </p>
     <p>
-    This completes our description of the Yioop! framework and how to
+    This completes our description of the Yioop framework and how to
     build a new site using it. It should be pointed out that code in
     the APP_DIR can be localized using the same mechanism as in BASE_DIR.
-    More details on this can be found in the section on
-    <a href="#localizing">Localizing Yioop!</a>.
+    More details on this can be found in the section on
+    <a href="#localizing">Localizing Yioop</a>.
     </p>
     <p><a href="#toc">Return to table of contents</a>.</p>
-    <h2 id='embedding'>Embedding Yioop! in an Existing Site</h2>
-    <p>One use-case for Yioop! is to use it to serve search result for your
+    <h2 id='embedding'>Embedding Yioop in an Existing Site</h2>
+    <p>One use-case for Yioop is to use it to serve search result for your
     existing site. There are three common ways to do this: (1)
-    On your site have a web-form or links with your installation of Yioop!
-    as their target and let Yioop! format the results. (2) Use the
-    same kind of form or links, but request an OpenSearch RSS Response from
-    Yioop! and then you format the results and display the results within
-    your site. (3) Your site makes functions calls of the Yioop! Search
+    On your site have a web-form or links with your installation of Yioop
+    as their target and let Yioop format the results. (2) Use the
+    same kind of form or links, but request an OpenSearch RSS Response from
+    Yioop and then you format the results and display the results within
+    your site. (3) Your site makes functions calls of the Yioop Search
     API and gets either PHP arrays or a string back and then does what it
     wants with the results. For access method (1) and (2) it is possible to
-    have Yioop! on an different machine so that it doesn't consume your main
+    have Yioop on an different machine so that it doesn't consume your main
     web-site's machines resources. As we mentioned in the configuration section
     it is possible to disable each of these access paths from within the Admin
     portion of the web-site. This might be useful for instance if you are using
     access methods (2) or (3) and don't want users to be able to access the
-    Yioop! search results via its built in web form. We will now spend a moment
+    Yioop search results via its built in web form. We will now spend a moment
     to look at each of these access methods in more detail...</p>
-    <h3>Accessing Yioop! via a Web Form</h3>
+    <h3>Accessing Yioop via a Web Form</h3>
     <p>A very minimal code snippet for such a
     form would be:</p>
     <pre>
@@ -2335,38 +2381,38 @@ OdpRdfArchiveBundle
 &lt;/form&gt;
     </pre>
     <p>In the above form, you should change YIOOP_LOCATION to your instance of
-    Yioop!'s web location, TIMESTAMP_OF_CRAWL_YOU_WANT should be the Unix
-    timestamp that appears in the name of the IndexArchive folder that you want
-    Yioop! to serve results from, LOCALE_TAG should be the locale you want
-    results displayed in, for example, en-US for American English. In addition,
+    Yioop's web location, TIMESTAMP_OF_CRAWL_YOU_WANT should be the Unix
+    timestamp that appears in the name of the IndexArchive folder that you want
+    Yioop to serve results from, LOCALE_TAG should be the locale you want
+    results displayed in, for example, en-US for American English. In addition,
     to  embedding this form on some page on your site, you would
     probably want to change the resources/yioop.png image to something more
     representative of your site. You might also want to edit the file
-    views/search_view.php to give a link back to your site from the
+    views/search_view.php to give a link back to your site from the
     search results.</p>
     <p>If you had a form such as above, clicking Search would take you
     to the URL:</p>
 <pre>
     YIOOP_LOCATION?its=TIMESTAMP_OF_CRAWL_YOU_WANT&amp;l=LOCALE_TAG&amp;q=QUERY
 </pre>
-    <p>where QUERY was what was typed in the search form. Yioop! supports two
+    <p>where QUERY was what was typed in the search form. Yioop supports two
     other kinds of queries: Related sites queries and cache look-up queries.
     The related query format is:</p>
 <pre>
     YIOOP_LOCATION?its=TIMESTAMP_OF_CRAWL_YOU_WANT&amp;l=LOCALE_TAG&amp;<?php
     ?>a=related&amp;arg=URL
 </pre>
-    <p>where URL is the url that you are looking up related URLs for. To do a
-    look up of the Yioop! cache of a web page the url format is:</p>
+    <p>where URL is the url that you are looking up related URLs for. To do a
+    look up of the Yioop cache of a web page the url format is:</p>
 <pre>
     YIOOP_LOCATION?its=TIMESTAMP_OF_CRAWL_YOU_WANT&amp;l=LOCALE_TAG&amp;<?php
     ?>q=QUERY&amp;a=cache&amp;arg=URL
 </pre>
     <p>Here the terms listed in QUERY will be styled in different colors in the
     web page that is returned; URL is the url of the web page you want to look
-    up in the cache.
+    up in the cache.
     </p>
-    <h3>Accessing Yioop! and getting and OpenSearch RSS Response</h3>
+    <h3>Accessing Yioop and getting and OpenSearch RSS Response</h3>
     <p>The same basic urls as above can return RSS results simply by appending
     to the end of the them &ampf=rss. This of course only makes sense for
     usual and related url queries -- cache queries return web-pages not
@@ -2378,7 +2424,7 @@ OdpRdfArchiveBundle
 xmlns:atom="http://www.w3.org/2005/Atom"
 &gt;
     &lt;channel&gt;
-        &lt;title&gt;PHP Search Engine - Yioop! : art&lt;/title&gt;
+        &lt;title&gt;PHP Search Engine - Yioop : art&lt;/title&gt;
         &lt;language&gt;en-US&lt;/language&gt;
         &lt;link&gt;http://localhost/git/yioop/?f=rss&amp;amp;q=art&amp;<?php
     ?>amp;its=1317152828&lt;/link&gt;
@@ -2386,18 +2432,18 @@ xmlns:atom="http://www.w3.org/2005/Atom"
         &lt;opensearch:totalResults&gt;1105&lt;/opensearch:totalResults&gt;
         &lt;opensearch:startIndex&gt;0&lt;/opensearch:startIndex&gt;
         &lt;opensearch:itemsPerPage&gt;10&lt;/opensearch:itemsPerPage&gt;
-        &lt;atom:link rel="search" type="application/opensearchdescription+xml"
+        &lt;atom:link rel="search" type="application/opensearchdescription+xml"
             href="http://localhost/git/yioop/yioopbar.xml"/&gt;
         &lt;opensearch:Query role="request" searchTerms="art"/&gt;

                 &lt;item&gt;
-                &lt;title&gt; An Online Fine Art Gallery U Can Buy Art  -
+                &lt;title&gt; An Online Fine Art Gallery U Can Buy Art  -
                 Buy Fine Art Online&lt;/title&gt;

                 &lt;link&gt;http://www.ucanbuyart.com/&lt;/link&gt;
-                &lt;description&gt; UCanBuyArt.com is an online art gallery
-                and dealer designed... art gallery and dealer designed for art
-                sales of high quality and original... art sales of high quality
+                &lt;description&gt; UCanBuyArt.com is an online art gallery
+                and dealer designed... art gallery and dealer designed for art
+                sales of high quality and original... art sales of high quality
                 and original art from renowned artists. Art&lt;/description&gt;
                 &lt;/item&gt;
                 ...
@@ -2406,21 +2452,21 @@ xmlns:atom="http://www.w3.org/2005/Atom"

 &lt;/rss&gt;
 </pre>
-    <p>Notice the opensearch: tags tell us the totalResults, startIndex and
+    <p>Notice the opensearch: tags tell us the totalResults, startIndex and
     itemsPerPage. The opensearch:Query tag tells us what the search terms
     were.</p>
-    <h3>Accessing Yioop! via the Function API</h3>
-    <p>The last way we will consider to get search results out of Yioop! is
-    via its function API. The Yioop! Function API consists of the following
+    <h3>Accessing Yioop via the Function API</h3>
+    <p>The last way we will consider to get search results out of Yioop is
+    via its function API. The Yioop Function API consists of the following
     three methods in controllers/search_controller.php :
     </p>
     <pre>
-    public function queryRequest($query, $results_per_page, $limit = 0)
+    function queryRequest($query, $results_per_page, $limit = 0)

-    public function relatedRequest($url, $results_per_page, $limit = 0,
+    function relatedRequest($url, $results_per_page, $limit = 0,
         $crawl_time = 0)

-    public function cacheRequest($url, $highlight=true, $terms ="",
+    function cacheRequest($url, $highlight=true, $terms ="",
         $crawl_time = 0)
     </pre>
     <p>These methods handle basic queries, related queries, and cache of
@@ -2433,24 +2479,24 @@ xmlns:atom="http://www.w3.org/2005/Atom"
     these methods as well as how to extract results from what is returned
     can be found in the file examples/search_api.php .</p>
     <p><a href="#toc">Return to table of contents</a>.</p>
-    <h2 id='customizing'>Customizing Yioop!</h2>
+    <h2 id='customizing'>Customizing Yioop</h2>
     <p>One advantage of an open-source project is that you have complete
-    access to the source code. Thus, you can modify Yioop! to fit in
+    access to the source code. Thus, you can modify Yioop to fit in
     with your existing project or add new feel free to add new features to
-    Yioop! In this section, we look a little bit at some common ways you
-    might try to modify Yioop! as well as ways to examine the output of a
+    Yioop. In this section, we look a little bit at some common ways you
+    might try to modify Yioop as well as ways to examine the output of a
     crawl in a more technical manner. If you decide to modify the source code
-    it is recommended you look at the <a
+    it is recommended you look at the <a
     href="#files">Summary of Files and Folders</a> above again, as well
-    as look at the <a href="http://www.seekquarry.com/yioop-docs/">online
-    Yioop! documentation</a>.</p>
+    as look at the <a href="http://www.seekquarry.com/yioop-docs/">online
+    Yioop documentation</a>.</p>

     <h3>Handling new File Types</h3>
-    <p>One relatively easy enhancement to Yioop! would be to enhance
+    <p>One relatively easy enhancement to Yioop would be to enhance
     the way it processes an existing file type or to get it to process
-    new file types. Yioop! was written from scratch without dependencies
+    new file types. Yioop was written from scratch without dependencies
     on existing projects. So the PHP processors for Microsoft
-    file formats and for PDF are only approximate. These
+    file formats and for PDF are only approximate. These
     processors can be found in lib/processors. To write your own
     processor, you should extend either the TextProcessor or ImageProcessor
     class. You then need to write in your subclass a static method
@@ -2461,26 +2507,26 @@ xmlns:atom="http://www.w3.org/2005/Atom"
     <pre>
     $summary['TITLE'] = a title for the document
     $summary['DESCRIPTION'] = a text summary extracted from the document
-    $summary['LINKS'] = an array of links (canonical not relative) extracted
+    $summary['LINKS'] = an array of links (canonical not relative) extracted
         from the document.
     </pre>
     <p>
     A good reference implementation of a TextProcessor subclass can be found in
     html_processor.php. If you are trying to support a new file type, to get
-    Yioop! to use your processor you need to edit the configs/config.php
-    file. In config.php you should add the extension of the file type
+    Yioop to use your processor you need to edit the configs/config.php
+    file. In config.php you should add the extension of the file type
     you are going to process to the array $INDEXED_FILE_TYPES. You will
     also need to add an entry to the $PAGE_PROCESSORS array of the
     form "new_mime_type_handle" =&gt; "NewProcessor" .
     </p>
     <p>If your processor is cool, only relies on code you wrote, and you
-    want to contribute it back to the Yioop!, please feel free to
+    want to contribute it back to the Yioop, please feel free to
     e-mail it to chris@pollett.org .</p>
     <h3>Using a Different Database Management System (DBMS)</h3>
-    <p>Yioop! currently supports Sqlite2, Sqlite3, and MySql databases.
-    To add support for a different DBMS, you would need to write a new subclass
+    <p>Yioop currently supports Sqlite2, Sqlite3, and MySql databases.
+    To add support for a different DBMS, you would need to write a new subclass
     of the DatasourceManager abstract class. The current subclasses can be
-    found in models/datasources. Yioop! relies on pretty vanilla SQL;
+    found in models/datasources. Yioop relies on pretty vanilla SQL;
     however, it does make use of the fact that some of its tables have
     AUTOINCREMENT columns. This can be simulated in Oracle and DB2 using
     the more sophisticated sequences and triggers. For Postgres you
@@ -2491,8 +2537,8 @@ xmlns:atom="http://www.w3.org/2005/Atom"

     <h3>Writing an Indexing Plugin</h3>
     <p>An indexing plugin provides a way that an advanced end-user
-    can extend the indexing capabilities of Yioop!. Bundled with
-    Yioop! is an example recipe indexing plugin which
+    can extend the indexing capabilities of Yioop. Bundled with
+    Yioop is an example recipe indexing plugin which
     can serve as a guide for writing your own plugin. It is
     found in the folder lib/indexing_plugins. This recipe
     plugin is used to detect food recipes which occur on pages during a crawl.
@@ -2508,60 +2554,60 @@ xmlns:atom="http://www.w3.org/2005/Atom"
     one go, but one could easily imagine reading through the list of recipes
     in batches of the amount that could fit in memory in one go.
     </p>
-    <p>The recipe plugin illustrates the kinds of things that can be
+    <p>The recipe plugin illustrates the kinds of things that can be
     written using indexing plugins. To make your own plugin, you
     would need to write a subclass of the class IndexingPlugin with a
     file name of the form mypluginname_plugin.php. Then you would need
     to put this file in the folder lib/indexing_plugins. In the file
     configs/config.php you would need to add the string "mypluginname" to
     the array $INDEXING_PLUGINS. To properly subclass IndexingPlugin,
-    your class needs to implement four methods:
-    pageProcessing($page, $url), postProcessing($index_name),
+    your class needs to implement four methods:
+    pageProcessing($page, $url), postProcessing($index_name),
     getProcessors(), getAdditionalMetaWords(). If your plugin needs
     to use any page processor or model classes, you should modify the
     $processors and $model instance array variables of your plugin to
     list the ones you need. During a web crawl, after a fetcher has downloaded
     a batch of web pages, it uses a page's mimetype to determine a page
     processor class to extract summary data from that page. The page processors
-    that Yioop! implements can be found in the folder lib/processors. They
+    that Yioop implements can be found in the folder lib/processors. They
     have file names of the form someprocessorname_processor.php. As a crawl
     proceeds, your plugin will typically be called to do further processing
-    of a page only in addition to some of these processors. The static method
+    of a page only in addition to some of these processors. The static method
     getProcessors() should return an array of the form array(
     "someprocessorname1", "someprocessorname2", ...), listing the processors
     that your plugin will do additional processing of documents for.
-    A page processor has a method handle($page, $url) called by Yioop!
+    A page processor has a method handle($page, $url) called by Yioop
     with a string $page of a downloaded document and a string $url of where it
     was downloaded from. This method first calls the process($page, $url)
     method of the processor to do initial summary extraction and then calls
     method pageProcessing($page, $url) of each indexing_plugin associated with
     the given processor. A pageProcessing($page, $url) method is expected
-    to return an array of subdoc arrays found on the given page. Each subdoc
-    array should haves a CrawlConstants::TITLE and a CrawlConstants::DESCRIPTION.
-    The handle method of a processor will add to each subdoc the
-    fields: CrawlConstants::LANG, CrawlConstants::LINKS, CrawlConstants::PAGE,
-    CrawlConstants::SUBDOCTYPE. The SUBDOCTYPE is the name of the plugin.
-    The resulting "micro-document" is inserted by Yioop! into the index
-    under the word nameofplugin:all . After the crawl is over, Yioop!
+    to return an array of subdoc arrays found on the given page. Each subdoc
+    array should haves a CrawlConstants::TITLE and a
+    CrawlConstants::DESCRIPTION. The handle method of a processor will add to
+    each subdoc the fields: CrawlConstants::LANG, CrawlConstants::LINKS,
+    CrawlConstants::PAGE, CrawlConstants::SUBDOCTYPE. The SUBDOCTYPE is the
+    name of the plugin. The resulting "micro-document" is inserted by Yioop into
+    the index under the word nameofplugin:all . After the crawl is over, Yioop
     will call the postProcessing($index_name) method of each indexing plugin
     that was in use. Here $index_name is the timestamp of the crawl. Your
     plugin can do whatever post processing it wants in this method.
     For example, the recipe plugin does searches of the index and uses
     the results of these searches to inject new meta-words into the index.
-    In order for Yioop! to be aware of the meta-words you are adding, you
-    need to implement the method getAdditionalMetaWords().
+    In order for Yioop to be aware of the meta-words you are adding, you
+    need to implement the method getAdditionalMetaWords().
     Also, the web snippet you might want in the search results for things
     like recipes might be longer or shorter than a typical result snippet.
-    The getAdditionalMetaWords() method also tells Yioop! this information.
-    For example, for the recipe plugin, getAdditionalMetaWords() returns
+    The getAdditionalMetaWords() method also tells Yioop this information.
+    For example, for the recipe plugin, getAdditionalMetaWords() returns
     the associative array:</p>
     <pre>
-    array("recipe:" => HtmlProcessor::MAX_DESCRIPTION_LEN,
+    array("recipe:" => HtmlProcessor::MAX_DESCRIPTION_LEN,
             "ingredient:" => HtmlProcessor::MAX_DESCRIPTION_LEN);
     </pre>
     <p>This completes the discussion of how to write an indexing plugin.</p>
     <p><a href="#toc">Return to table of contents</a>.</p>
-    <h2 id='commandline'>Yioop! Command-line Tools</h2>
+    <h2 id='commandline'>Yioop Command-line Tools</h2>
     <h3>Configuring Yioop from the Command-line</h3>
     <p>In a multiple queue server and fetcher setting, one might have web access
     only to the name server machine -- all the other machines might be on
@@ -2577,7 +2623,7 @@ php configure_tool.php
     <p>When launched, this program will display a menu like:</p>
     <pre>

-YIOOP! CONFIGURATION TOOL
+Yioop CONFIGURATION TOOL
 +++++++++++++++++++++++++

 Checking Yioop configuration...
@@ -2601,17 +2647,17 @@ Available Options:
 Please choose an option:
     </pre>
     <p>
-    Except for the Change root password option, these correspond to the
+    Except for the Change root password option, these correspond to the
     different fieldsets on the Configure activity. The command-line forms let
-    one gets from selecting one of these choise let one set the same
+    one gets from selecting one of these choise let one set the same
     values as were described earlier in the
     <a href="#installation">Installation</a> section. The change root password
-    option lets one set the account password for root. i.e., the main admin
+    option lets one set the account password for root. i.e., the main admin
     user.On a non-nameserver machine, it is probably simpler to go with
-    a sqlite database, rather than hit on a global mysql database from
+    a sqlite database, rather than hit on a global mysql database from
     each machine. Such a barebones local database set-up would typically
     only have one user, root</p>
-    <p>Another thing to consider, when configuring a collection of Yioop!
+    <p>Another thing to consider, when configuring a collection of Yioop
     machines in such a setting, is that by default, under Search Access Set-up,
     subsearch is unchecked. This means the RSS feeds won't be downloaded
     hourly on such machines. If one unchecks this, they will. This may or
@@ -2619,10 +2665,10 @@ Please choose an option:
     downloading of RSS feeds across several machines -- any machine in
     a Yioop cluster can send media news results in response to a search query.
     </p>
-    <h3>Examining the contents of WebArchiveBundle's and
+    <h3>Examining the contents of WebArchiveBundle's and
     IndexArchiveBundles's</h3>
     <p>
-    The command-line script bin/arc_tool.php can be use to examine the
+    The command-line script bin/arc_tool.php can be use to examine the
     contents of a WebArchiveBundle or an IndexArchiveBundle. i.e., it gives
     a print out of the web pages or summaries contained therein. It can also
     be used to give information from the headers of these bundles. Finally,
@@ -2634,14 +2680,14 @@ Please choose an option:
 php arc_tool.php info bundle_name //return info about
 //documents stored in archive.

-php arc_tool.php list //returns a list
-//of all the archives in the Yioop! crawl directory, including
-//non-Yioop! archives in the cache/archives sub-folder.
+php arc_tool.php list //returns a list
+//of all the archives in the Yioop crawl directory, including
+//non-Yioop archives in the cache/archives sub-folder.

 php arc_tool.php mergetiers bundle_name max_tier
 //merges tiers of word dictionary into one tier up to max_tier

-php arc_tool.php reindex bundle_name
+php arc_tool.php reindex bundle_name
 //reindex the word dictionary in bundle_name

 php arc_tool.php show bundle_name start num //outputs
@@ -2717,7 +2763,7 @@ ASCII
 ...

 |chris-polletts-macbook-pro:bin:117&gt;php arc_tool.php reindex <?php
-?>IndexData1317414152
+?>IndexData1317414152

 Shard 0
 [Sat, 01 Oct 2011 11:05:17 -0700] Adding shard data to dictionary files...
@@ -2727,15 +2773,15 @@ Final Merge Tiers

 Reindex complete!!
 </pre>
-<p>The mergetiers command is like a partial reindex. It assumes all the shard
+<p>The mergetiers command is like a partial reindex. It assumes all the shard
 words have been added to the dictionary, but that the dictionary
 still has more than one tier (tiers are the result of incremental
-log-merges which are made during the crawling process). The
+log-merges which are made during the crawling process). The
 mergetiers command merges these tiers into one large tier which is
-then usable by Yioop! for query processing.<p>
+then usable by Yioop for query processing.<p>
     <h3>Querying an Index from the command-line</h3>
 <p>The command-line script bin/query_tool.php can be use to query
-indices in the Yioop! WORK_DIRECTORY/cache. This tool can be used
+indices in the Yioop WORK_DIRECTORY/cache. This tool can be used
 on an index regardless of whether or not Apache is running. It can be
 used for long running queries that might timeout when run within a browser
 to put their results into memcache or filecache. The command-line arguments
@@ -2754,7 +2800,7 @@ The following shows how one could do a query on "Chris Pollett":
 TITLE: ECCC - Pointers to
 URL: http://eccc.hpi-web.de/static/pointers/<?php
 ?>personal_www_home_pages_of_complexity_theorists/
-IPs: 141.89.225.3
+IPs: 141.89.225.3
 DESCRIPTION: Homepage of the Electronic Colloquium on Computational <?php
 ?>Complexity located
 at the Hasso Plattner Institute of Potsdam, Germany Personal WWW pages of
@@ -2769,7 +2815,7 @@ Score: 4.14
 TITLE: ECCC - Pointers to
 URL: http://www.eccc.uni-trier.de/static/pointers/<?php
 ?>personal_www_home_pages_of_complexity_theorists/
-IPs: 141.89.225.3
+IPs: 141.89.225.3
 DESCRIPTION: Homepage of the Electronic Colloquium on Computational <?php
 ?>Complexity located
 at the Hasso Plattner Institute of Potsdam, Germany Personal WWW pages of
@@ -2783,57 +2829,57 @@ Score: 4.03
 .....
 </pre>
 <p>The index the results are returned from is the default index; however,
-all of the Yioop! meta words should work so you can do queries like
+all of the Yioop meta words should work so you can do queries like
 "my_query i:timestamp_of_index_want". Query results depend on the
-kind of language stemmer/char-gramming being used, so French results might be
+kind of language stemmer/char-gramming being used, so French results might be
 better if one specifies fr-FR then if one relies on the default en-US.</p>
     <h2 id="references">References</h2>
     <dl>
 <dt id="APC2003">[APC2003]</dt>
 <dd>Serge Abiteboul and Mihai Preda and Gregory Cobena.
 <a href="http://leo.saclay.inria.fr/publifiles/gemo/GemoReport-290.pdf"
->Adaptive on-line page importance computation</a>.
-In: Proceedings of the 12th international conference on World Wide Web.
+>Adaptive on-line page importance computation</a>.
+In: Proceedings of the 12th international conference on World Wide Web.
 pp.280-290. 2003.
 </dd>
 <dt id="B1970">[B1970]</dt>
-<dd>Bloom, Burton H.
+<dd>Bloom, Burton H.
 <a href="http://dx.doi.org/10.1145%2F362686.362692"
->Space/time trade-offs in hash coding with allowable errors</a>.
+>Space/time trade-offs in hash coding with allowable errors</a>.
 Communications of the ACM Volume 13 Issue 7. pp. 422–426. 1970.
 <dd>
 <dt id="BSV2004">[BSV2004]</dt>
-<dd>Paolo Boldi and  Massimo Santini and Sebastiano Vigna.
+<dd>Paolo Boldi and  Massimo Santini and Sebastiano Vigna.
 <a href="http://vigna.dsi.unimi.it/ftp/papers/ParadoxicalPageRank.pdf"
->Do Your Worst to Make the Best:
-Paradoxical Effects in PageRank Incremental Computations</a>.
+>Do Your Worst to Make the Best:
+Paradoxical Effects in PageRank Incremental Computations</a>.
 Algorithms and Models for the Web-Graph. pp. 168–180. 2004. </dd>
 <dt id='BP1998'>[BP1998]</dt>
-<dd>Brin, S. and Page, L.
+<dd>Brin, S. and Page, L.
 <a  href="http://infolab.stanford.edu/~backrub/google.html"
-    >The Anatomy of a Large-Scale Hypertextual Web Search Engine</a>.
-In: Seventh International World-Wide Web Conference
+    >The Anatomy of a Large-Scale Hypertextual Web Search Engine</a>.
+In: Seventh International World-Wide Web Conference
 (WWW 1998). April 14-18, 1998. Brisbane, Australia. 1998.</dd>
 <dt id='BCC2010'>[BCC2010]</dt>
-<dd>S. Büttcher, C. L. A. Clarke, and G. V. Cormack.
+<dd>S. Büttcher, C. L. A. Clarke, and G. V. Cormack.
 <a href="http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=12307"
 >Information Retrieval: Implementing and Evaluating Search Engines</a>.
 MIT Press. 2010.</dd>
 <dt id="DG2004">[DG2004]</dt>
-<dd>Jeffrey Dean and Sanjay Ghemawat.
+<dd>Jeffrey Dean and Sanjay Ghemawat.
 <a href="http://research.google.com/archive/mapreduce-osdi04.pdf"
->MapReduce: Simplified Data Processing on Large Clusters</a>.
+>MapReduce: Simplified Data Processing on Large Clusters</a>.
 OSDI'04: Sixth Symposium on Operating System Design and Implementation. 2004<dd>
 <dt id="GGL2003">[GGL2003]</dt>
-<dd>Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung.
+<dd>Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung.
 <a href="http://research.google.com/archive/gfs-sosp2003.pdf
-">The Google File System</a>.
+">The Google File System</a>.
 19th ACM Symposium on Operating Systems Principles. 2003.</dd>
 <dt id='H2002'>[H2002]</dt>
-<dd>T. Haveliwala.
+<dd>T. Haveliwala.
 <a href="
 http://infolab.stanford.edu/~taherh/papers/topic-sensitive-pagerank.pdf"
->Topic-Sensitive PageRank</a>. Proceedings of the Eleventh International
+>Topic-Sensitive PageRank</a>. Proceedings of the Eleventh International
 World Wide Web Conference (Honolulu, Hawaii). 2002.</dd>
 <dt id="KSV2010">[KSV2010]</dt>
 <dd>Howard Karloff, Siddharth Suri, and Sergei Vassilvitskii.
@@ -2848,73 +2894,72 @@ CommerceNet Labs Technical Report 04. 2004.</dd>
 <dd>Jimmy Lin and Chris Dyer.
 <a href="http://www.umiacs.umd.edu/~jimmylin/MapReduce-book-final.pdf"
 >Data-Intensive Text Processing with MapReduce</a>.
-Synthesis Lectures on Human Language Technologies.
+Synthesis Lectures on Human Language Technologies.
 Morgan and Claypool Publishers. 2010.</dd>
 <dt id="LM2006">[LM2006]</dt>
-<dd>Amy N. Langville and Carl D. Meyer.
+<dd>Amy N. Langville and Carl D. Meyer.
 <a  href="http://press.princeton.edu/titles/8216.html"
 >Google's PageRank and Beyond</a>.
 Princton University Press. 2006.</dd>
 <dt id="MKSR2004">[MRS2008]</dt>
-<dd>C. D. Manning, P. Raghavan and H. Schütze.
+<dd>C. D. Manning, P. Raghavan and H. Schütze.
 <a href="http://nlp.stanford.edu/IR-book/information-retrieval-book.html"
->Introduction to Information Retrieval</a>.
+>Introduction to Information Retrieval</a>.
 Cambridge University Press. 2008.</dd>
 <dt id="MKSR2004">[MKSR2004]</dt>
-<dd>G. Mohr, M. Kimpton, M. Stack, and I.Ranitovic.
+<dd>G. Mohr, M. Kimpton, M. Stack, and I.Ranitovic.
 <a href="http://iwaw.europarchive.org/04/Mohr.pdf"
 >Introduction to Heritrix, an archival quality web crawler</a>.
 4th International Web Archiving Workshop. 2004. </dd>
 <dt id='P1997a'>[P1997a]</dt>
-<dd>J. Peek.
+<dd>J. Peek.
 Summary of the talk: <a href="
 http://www.usenix.org/publications/library/proceedings/ana97/
-summaries/monier.html">The AltaVista Web Search Engine</a> by Louis Monier.
-USENIX Annual Technical Conference Anaheim. California. ;login: Volume 22.
+summaries/monier.html">The AltaVista Web Search Engine</a> by Louis Monier.
+USENIX Annual Technical Conference Anaheim. California. ;login: Volume 22.
 Number 2. April 1997.</dd>
 <dt id='P1997b'>[P1997b]</dt>
-<dd>J. Peek.
+<dd>J. Peek.
 Summary of the talk: <a href="
 http://www.usenix.org/publications/library/proceedings/
-ana97/summaries/brewer.html">The Inktomi Search Engine</a> by Louis Monier.
-USENIX Annual Technical Conference. Anaheim, California. ;login: Volume 22.
+ana97/summaries/brewer.html">The Inktomi Search Engine</a> by Louis Monier.
+USENIX Annual Technical Conference. Anaheim, California. ;login: Volume 22.
 Number 2. April 1997.</dd>
 <dt id="P1994">[P1994]</dt>
 <dd>B. Pinkerton.
 <a href="http://web.archive.org/web/20010904075500/http://archive.ncsa.uiuc.edu/
 SDG/IT94/Proceedings/Searching/pinkerton/WebCrawler.html"
->Finding what people want: Experiences with the WebCrawler</a>.
-In Proceedings of the First World Wide Web Conference, Geneva, Switzerland.
+>Finding what people want: Experiences with the WebCrawler</a>.
+In Proceedings of the First World Wide Web Conference, Geneva, Switzerland.
 1994.</dd>
 <dt id="P1980">[P1980]</dt>
-<dd>M.F. Porter.
+<dd>M.F. Porter.
 <a href="http://tartarus.org/~martin/PorterStemmer/def.txt"
->An algorithm for suffix stripping.</a>
+>An algorithm for suffix stripping.</a>
 Program. Volume 14 Issue 3. 1980. pp 130−137.
-On the same website, there are <a
+On the same website, there are <a
 href="http://snowball.tartarus.org/">stemmers for many other languages</a>.</dd>
 <dt id='PDGQ2006'>[PDGQ2006]</dt>
-<dd>Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan.
+<dd>Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan.
 <a href="http://research.google.com/archive/sawzall-sciprog.pdf"
->Interpreting the Data: Parallel Analysis with Sawzall</a>.
-Scientific Programming Journal. Special Issue on Grids and Worldwide Computing
+>Interpreting the Data: Parallel Analysis with Sawzall</a>.
+Scientific Programming Journal. Special Issue on Grids and Worldwide Computing
 Programming Models and Infrastructure.Volume 13. Issue 4. 2006. pp.227-298.</dd>
 <dt id="W2009">[W2009]</dt>
-<dd>Tom White.
+<dd>Tom White.
 <a href="http://www.amazon.com/gp/
 product/1449389732/ref=pd_lpo_k2_dp_sr_1?pf_rd_p=486539851&
 pf_rd_s=lpo-top-stripe-1&pf_rd_t=201&pf_rd_i=0596521979&pf_rd_m=ATVPDKIKX0DER&
-pf_rd_r=0N5VCGFDA7V7MJXH69G6">Hadoop: The Definitive Guide</a>.
+pf_rd_r=0N5VCGFDA7V7MJXH69G6">Hadoop: The Definitive Guide</a>.
 O'Reilly. 2009.</dd>
-<dt id="ZCTSR2004">[ZCTSR2004]</dt>
-<dd>Hugo Zaragoza, Nick Craswell, Michael Taylor,
-Suchi Saria, and Stephen Robertson.
-<a
+<dt id="ZCTSR2004">[ZCTSR2004]</dt>
+<dd>Hugo Zaragoza, Nick Craswell, Michael Taylor,
+Suchi Saria, and Stephen Robertson.
+<a
 href="http://trec.nist.gov/pubs/trec13/papers/microsoft-cambridge.web.hard.pdf"
->Microsoft Cambridge at TREC-13: Web and HARD tracks</a>.
+>Microsoft Cambridge at TREC-13: Web and HARD tracks</a>.
 In Proceedings of 3th Annual Text Retrieval Conference. 2004.</dd>
     </dl>
     <p><a href="#toc">Return to table of contents</a>.</p>
 </div>

-
diff --git a/en-US/pages/downloads.thtml b/en-US/pages/downloads.thtml
index d3b7ac8..5465ad4 100755
--- a/en-US/pages/downloads.thtml
+++ b/en-US/pages/downloads.thtml
@@ -1,31 +1,38 @@
 <h1>Downloads</h1>
-<h2>Yioop! Releases</h2>
-<p>The Yioop! source code is still at an alpha stage. </p>
+<h2>Yioop Releases</h2>
+<p>The Yioop source code is still at an alpha stage. </p>
 <ul>
-<li><a href="http://www.seekquarry.com/viewgit/?a=archive&amp;p=yioop&amp;h=3ba7c0901b792891b6b279732e5184668b294e44&amp;hb=8b105749c471bbfe97df88e84df8f9c239027a01&amp;t=zip"
+<li><a href="http://www.seekquarry.com/viewgit/?
+a=archive&amp;p=yioop&amp;h=3ba7c0901b792891b6b279732e5184668b294e44&amp;
+hb=8b105749c471bbfe97df88e84df8f9c239027a01&amp;t=zip"
     >Version 0.90-ZIP</a></li>
-<li><a href="http://www.seekquarry.com/viewgit/?a=archive&amp;p=yioop&amp;h=1be2b50b8436998ce8d2d41f5db3b470610aa817&amp;hb=6fc863b1aaf26d8a0abf49a2aad9c7ce440ea307&amp;t=zip"
+<li><a href="http://www.seekquarry.com/viewgit/?
+a=archive&amp;p=yioop&amp;h=1be2b50b8436998ce8d2d41f5db3b470610aa817&amp;
+hb=6fc863b1aaf26d8a0abf49a2aad9c7ce440ea307&amp;t=zip"
     >Version 0.88-ZIP</a></li>
 </ul>
 <h2>Installation</h2>
-<p>The documentation page has information about the
-<a href="?c=main&p=documentation#requirements"
->requirements</a> of and
-<a href="?c=main&p=documentation#installation"
->installation procedure</a> for Yioop!. The
-<a href="?c=main&amp;p=install">Install Guides</a> page
+<p>The documentation page has information about the
+<a href="?c=main&amp;p=documentation#requirements"
+>requirements</a> of and
+<a href="?c=main&amp;p=documentation#installation"
+>installation procedure</a> for Yioop. The
+<a href="?c=main&amp;p=install">Install Guides</a> page
 explains how to get Yioop work in some common settings.</p>
 <h2>Git Repository / Contributing</h2>
-<p>The Yioop! git repository allows anonymous read-only access. If you would to
-contribute to Yioop!, just do a clone of the most recent code,
-make your changes, do a pull, and make a patch. For example, to clone the
+<p>The Yioop git repository allows anonymous read-only access. If you would
+like to contribute to Yioop, just do a clone of the most recent code,
+make your changes, do a pull, and make a patch. For example, to clone the
 repository  assuming you have git, type:</p>
 <p><b>git clone https://seekquarry.com/git/yioop.git</b></p>
 <p>
-Create/update an issue in the <a href="/mantis/">Yioop! issue tracker</a>
-describing what your patch solves and upload the patch. To contribute
+The <a href="?c=main&amp;p=coding">Yioop Coding Guidelines</a> explain
+the form your code should be in when making a patch as well as
+how to create patches. You can create/update an issue in the
+<a href="/mantis/">Yioop issue tracker</a>
+describing what your patch solves and upload your patch. To contribute
 localizations, you can use the GUI interface in your own
-copy of Yioop! to enter in your localizations. Next locate in the locale
-folder of your Yioop! work directory the locale tag of the
+copy of Yioop to enter in your localizations. Next locate in the locale
+folder of your Yioop work directory the locale tag of the
 language you added translations for. Within this folder is a configure.ini
 file, just make an issue in the issue tracker and upload this file there.</p>
diff --git a/en-US/pages/home.thtml b/en-US/pages/home.thtml
index 5dd23e4..7f91822 100755
--- a/en-US/pages/home.thtml
+++ b/en-US/pages/home.thtml
@@ -1,38 +1,38 @@
 <h1>Open Source Search Engine Software!</h1>
-<p>SeekQuarry is the parent site for <a href="http://www.yioop.com/">Yioop!</a>.
-Yioop! is a <a href="http://gplv3.fsf.org/">GPLv3</a>, open source, PHP search
-engine. Yioop! can be configured  as either a general purpose
-search engine for the whole web or it can be configured to provide search
+<p>SeekQuarry is the parent site for <a href="http://www.yioop.com/">Yioop</a>.
+Yioop is a <a href="http://gplv3.fsf.org/">GPLv3</a>, open source, PHP search
+engine. Yioop can be configured as either a general purpose
+search engine for the whole web or it can be configured to provide search
 results for a set of urls or domains.
 </p>
 <h2>Goals</h2>
-<p>Yioop! was designed with the following goals in mind:</p>
+<p>Yioop was designed with the following goals in mind:</p>
 <ul>
-<li><b>Make it easier to obtain personal crawls of the web.</b> Only a web
-server such as Apache and PHP 5.3 or better is needed. Configuration can be
+<li><b>Make it easier to obtain personal crawls of the web.</b> Only a web
+server such as Apache and PHP 5.3 or better is needed. Configuration can be
 done using a GUI interface.</li>
 <li><b>Support distributed crawling of the web, if desired.</b> To download
 many web pages quickly, it is useful to have more than one machine when crawling
 the web. If you have several machines at home, simply install the software
-on all the machines you would like to use in a web crawl. In the configuration
-interface give the URL of the machine you would like to serve search results
+on all the machines you would like to use in a web crawl. In the configuration
+interface give the URL of the machine you would like to serve search results
 from. Start at least one queue server and as many fetchers as desired on
 the other machines.</li>
 <li><b>Be fast and online.</b> Yioop is "online" in
-that it creates a word index and document ranking as it crawls rather
+that it creates a word index and document ranking as it crawls rather
 than ranking as a separate step. This keeps the processing done by any
-machine as low as possible so you can still use them for what you bought them
-for. Nevertheless, it is reasonably fast: A test set-up consisting of three
+machine as low as possible so you can still use them for what you bought them
+for. Nevertheless, it is reasonably fast: A test set-up consisting of three
 Mac Mini's each with 8GB RAM, a queue_server, and five fetchers adds a
 100 million pages to its index every four weeks.
 </li>
-<li><b>Make it easy to archive crawls.</b> Crawls are stored in timestamped
-folders that can be moved around zipped, etc. Through the admin interface you
-can select amongst crawls which exist in a crawl folder as to which crawl you
+<li><b>Make it easy to archive crawls.</b> Crawls are stored in timestamped
+folders that can be moved around zipped, etc. Through the admin interface you
+can select amongst crawls which exist in a crawl folder as to which crawl you
 want to serve from.</li>
 <li><b>Make it easy to crawl archives.</b> There are many sources of
 raw web data available today such as files that use the Internet Archive's
-arc format, Open Directory Project RDF data, Wikipedia xml dumps, etc. Yioop!
-can index these formats directly, allowing one to get an index for these
+arc format, Open Directory Project RDF data, Wikipedia xml dumps, etc. Yioop
+can index these formats directly, allowing one to get an index for these
 high-value sites without needing to do an exhaustive crawl.</li>
-</ul>
+</ul>
\ No newline at end of file
diff --git a/en-US/pages/install.thtml b/en-US/pages/install.thtml
index 9398423..663711d 100755
--- a/en-US/pages/install.thtml
+++ b/en-US/pages/install.thtml
@@ -2,32 +2,33 @@
     <ul>
         <li><a href="#xampp">XAMPP on Windows</a></li>
         <li><a href="#wamp">WAMP</a></li>
-        <li><a href="#linux">Ubuntu Linux</a></li>
         <li><a href="#osx">Mac OSX / Mac OSX Server</a></li>
+        <li><a href="#ubuntu">Ubuntu Linux</a></li>
+        <li><a href="#centos">Centos Linux</a></li>
         <li><a href="#cpanel">CPanel</a></li>
         <li><a href="#multiple">System with Multiple Queue Servers</a></li>
     </ul>

 <h2 id="xampp">XAMPP on Windows</h2>
 <ol>
-<li>Download <a
+<li>Download <a
     href="http://technet.microsoft.com/en-us/sysinternals/bb896649">pstools</a>
     (which contains psexec).</li>
-<li>Download <a
+<li>Download <a
     href="http://www.apachefriends.org/en/xampp-windows.html">Xampp</a>
 (Note: Yioop! 0.9 or higher works on latest version;
 Yioop! 0.88 or lower works up till Xampp 1.7.7)</li>
 <li>Install xampp</li>
 <li>Copy PsExec from the pstools zip folder to C:\xampp\php</li>
-<li>Open control panel. Go to System =&gt; Advanced system settings =&gt;
+<li>Open control panel. Go to System =&gt; Advanced system settings =&gt;
 Advanced. Click on Environment Variables. Look under System Variables and
 select Path. Click Edit. Tack onto the end of the Variable Values:
 <pre>
 ;C:\xampp\php;
 </pre>
 Click OK a bunch times to get rid of windows. Close the control panel window.
-Reopen it and go to the same place to make sure the path variable really
-was changed.
+Reopen it and go to the same place to make sure the path variable really
+was changed.
 </li>
 <li>Edit the file C:\xampp\php\php.ini in Notepad. Search on curl:
 change the line:
@@ -46,14 +47,15 @@ to
 <pre>
 post_max_size = 32M
 </pre>
-Start Apache.</li>
-<li>Download <a href="http://www.seekquarry.com/viewgit/?a=summary&p=yioop"
+Start Apache. This change is not strictly necessary, but will improve
+performance.</li>
+<li>Download <a href="http://www.seekquarry.com/viewgit/?a=summary&amp;p=yioop"
 >Yioop!</a> (you should choose some version &gt; 0.88 or latest)
 Unzip it into
 <pre>
 C:\xampp\htdocs
 </pre>
-Rename the downloaded folder yioop (so now have
+Rename the downloaded folder yioop (so now have
 a folder C:\xampp\htdocs\yioop).
 </li>
 <li>
@@ -96,22 +98,22 @@ Submit
 </pre>
 </li>
 <li>You might need to restart the machine to get the next steps to work</li>
-<li>In Manage Machines, click ON on the queue server and on your fetcher.
+<li>In Manage Machines, click ON on the queue server and on your fetcher.
 For each click
 on the log file and make sure that after at most two minutes you are seeing log
 entries appear.</li>
-<li>Now go to Manage Crawls. Click on Options.
-Set the options you would like for your crawl,
+<li>Now go to Manage Crawls. Click on Options.
+Set the options you would like for your crawl,
 click Save.</li>
-<li>Type name of the crawl and start crawl. Let it crawl for a while,
+<li>Type name of the crawl and start crawl. Let it crawl for a while,
 till you see the Total URls Seen > 1.</li>
-<li>Click stop crawl and waited for the crawl to appear in the previous
+<li>Click stop crawl and waited for the crawl to appear in the previous
 crawls list.Set it as the default crawl. Then you can search using this index.
 </li>
 </ol>
 <p>
 The above set-up is for a non-command line crawl, and it works as described.
-For command line crawls on versions of Yioop prior to Version 0.9 you might
+For command line crawls on versions of Yioop prior to Version 0.9 you might
 have the problem that log messages are written to Xampp's PHP error log
 because Yioop uses the PHP error_log function and on Xampp this is where
 it defaults to. This is not an issue in Version 0.9 or above.
@@ -124,24 +126,25 @@ These instructions should work for Yioop! Version 0.84 and above.
 WampServer allows you to run a 64 bit version of PHP.
 </p>
 <ol>
-<li>Download <a
-    href="http://technet.microsoft.com/en-us/sysinternals/bb896649">pstools
+<li>Download <a
+    href="http://technet.microsoft.com/en-us/sysinternals/bb896649">pstools
     (which contains psexec)</a>.</li>
-<li>Download <a
-    href="http://www.wampserver.com/en/">WampServer</a> (Note: Yioop! 0.9 or
+<li>Download <a
+    href="http://www.wampserver.com/en/">WampServer</a> (Note: Yioop! 0.9 or
 higher works works with PHP 5.4)</li>
-<li>Download <a href="http://www.seekquarry.com/viewgit/?a=summary&p=yioop"
+<li>Download <a href="http://www.seekquarry.com/viewgit/?a=summary&amp;p=yioop"
 >Yioop!</a> (you should choose some version &gt; 0.88 or latest)
 Unzip it into
 <pre>
 C:\wamp\www
 </pre>
-Rename the downloaded folder yioop (so now have
+Rename the downloaded folder yioop (so now have
 a folderC:\wamp\www\yioop).</li>
 <li>Edit php.ini to enable multicurl and change the post_max_size. To do
 this use the Wamp dock tool and navigate to wamp =&gt; php =&gt; extension.
 Turn on curl. Next navigate to wamp =&gt; php =&gt; php.ini .
-Do a find on post_max_size and set its value to 32MB.</li>
+Do a find on post_max_size and set its value to 32MB. The post_max_size change
+is not strictly necessary, but will improve performance.</li>
 </li>
 <li>Wamp has two php.ini files. The one we just edited by doing this is in
 <pre>
@@ -156,7 +159,7 @@ Open this php.ini in Notepad search on curl then uncomment the line. Similarly,
 edit post_max_size and set it to 32MB.
 </li>
 <li>Copy PsExec.exe to C:\wamp\bin\php\php5.3.10 .</li>
-<li>Go  to control panel =&gt; system =&gt; advanced system settings =>
+<li>Go  to control panel =&gt; system =&gt; advanced system settings =>
 advanced =&gt; environment variables =&gt; system variables =&gt;path.
 Click edit and add to the path variable:
 <pre>
@@ -165,14 +168,14 @@ Click edit and add to the path variable:
 Exit control panel, then re-enter to double check that path really was added
  to end</li>
 <li> Next go to
-wamp =&gt; apache =&gt; restart service. In a browser, go to
+wamp =&gt; apache =&gt; restart service. In a browser, go to
 http://localhost/yioop/ . You should see a configure screen
 where you can enter C:/yioop_data for the Work Directory. It
 will ask you to re-login. Use the login: root and no password.
 Now go to Yioop =&gt;
 Configure and input the following settings:
 <pre>
-Search Engine Work Directory: C:/yioop_data
+Search Engine Work Directory: C:/yioop_data
 Default Language: English
 (initially only the above )
 Debug Display: (all checked)
@@ -204,12 +207,153 @@ Submit
 <li>Go to Manage Crawls. Click on the options to set up where you want to crawl.
 Type in a name for the crawl and click start crawl.</li>
 <li>Let it crawl for a while, till you see the Total URls Seen &gt; 1.</li>
-<li>Then click Stop Crawl and wait for the crawl to appear in the previous
+<li>Then click Stop Crawl and wait for the crawl to appear in the previous
 crawls list. Set it as the default crawl. You should be
 able to search using this index
 </li>
 </ol>
-<h2 id="linux">Ubuntu Linux</h2>
+
+<h2 id="osx">Mac OSX / Mac OSX Server</h2>
+<p>The instructions given here are for OSX Mountain Lion, Apple changes
+the positions with which files can be found slightly between
+versions, so you might have to do a little exploration to find things
+for earlier OSX versions. </p>
+<ol>
+<li>Turn on Apache with PHP enabled.
+<ul>
+<li><b>Not OSX Server:</b> Traditionally, (pre-Mountain Lion) OSX, one
+could go to Control Panel =&gt; Sharing, and turn on Web Sharing to
+get the web server running. This option was removed in Mountain Lion, however,
+from the command line (Terminal), one can type:
+<pre>
+sudo apachectl start
+</pre>
+to start the Web server, and similarly,
+<pre>
+sudo apachectl stop
+</pre>
+to stop it. Alternatively, to make it so the WebServer starts each time the
+machine is turned on one can type:
+<div><br />
+<tt>sudo defaults write /System/Library/LaunchDaemons/org.apache.httpd</tt
+><tt>Disabled -bool false</tt>
+</div>
+<br />By default, document root is
+/Library/WebServer/Documents. The configuration files for Apache in
+this setting are located in /etc/apache2. If you want to tweak document
+root or other apache settings, look in the folder /etc/apache2/other and
+edit appropriate files such as httpd-vhosts.conf or httpd-ssl.conf .
+Before turning on Web Sharing / the
+web server, you would want to edit the file /etc/apache/httpd.conf, replace
+<pre>
+#LoadModule php5_module libexec/apache2/libphp5.so
+</pre>
+with
+<pre>
+LoadModule php5_module libexec/apache2/libphp5.so
+</pre>
+</li>
+<li><b>OSX Server:</b> Pre-mountain lion, OSX Server used /etc/apache2
+to store its configuration files, since Mountain Lion these files are in
+/Library/Server/Web/Config/apache2 . Within this folder, the sites folder
+holds Apache directives for specific virtual hosts. OSX Server comes
+with Server.app which will actively fight any direct tweaking to configuration
+files. From Server.app to get the web server running click on Websites.
+Make sure "Enable PHP web applications" is checked and Websites is On.
+The default web site is /Library/Server/Web/Data/Sites/Default , you
+probably want to click on + under websites and specify document root to
+be as you like.
+</li>
+</ul>
+</li>
+<li>
+Modify the php.ini file, this is likely in the file /private/etc/php.ini.
+You want to change
+<pre>
+post_max_size = 8M
+to
+post_max_size = 32M
+</pre>
+Restart the web server after making this change. This change is not strictly
+necessary, but will improve performance.
+</li>
+<li>
+We are going to configure Yioop so that fetchers and queue_servers
+can be started from the GUI interface. On an OSX machine, Yioop makes
+use of the Unix "at" command. On OSX to enable  "at" jobs, you might need to
+type:
+<pre>
+sudo launchctl load -w /System/Library/LaunchDaemons/com.apple.atrun.plist
+</pre>
+</li>
+<li>For the remainder of this guide, we assume document root for
+the web server is: /Library/WebServer/Documents.
+<a href="http://www.seekquarry.com/viewgit/?a=summary&amp;p=yioop"
+>Download Yioop</a>, unpack it into /Library/WebServer/Documents and rename
+the Yioop folder to yioop.</li>
+<li>Make a folder for your crawl data:
+<pre>
+sudo mkdir /Library/WebServer/Documents/yioop_data
+sudo chmod 777 /Library/WebServer/Documents/yioop_data
+</pre>
+You probably want to make sure Spotlight (Mac's built-in file and folder
+indexer) doesn't index this folder -- especially during a crawl -- or you
+system might really slow down. To prevent this open Control Panel, choose
+Spotlight, select the Privacy tab, and add the above folder to the list
+of folder Spotlight shouldn't index. If you are storing crawls on an
+external drive, you might want to make sure that drive gets automounted
+without a login, in the event of a power failure that exceeds your backup power
+supply time. To do this you can write the preference:
+<div><br /><tt>
+sudo defaults write /Library/Preferences/SystemConfiguration</tt
+><tt>/autodiskmount AutomountDisksWithoutUserLogin -bool true</tt>
+</div><br />
+</li>
+<li>In a browser, go to the page http://localhost/yioop/ .
+You should see a configure screen
+where you can enter /Library/WebServer/Documents/yioop_data for the
+Work Directory. It will ask you to re-login. Use the login: root and no
+password. Now go to Yioop =&gt;
+Configure and input the following settings:
+<pre>
+Search Engine Work Directory: /Library/WebServer/Documents/yioop_data
+Default Language: English
+Debug Display: (all checked)
+Search access: (all checked)
+Database Set-up: (left unchanged)
+Search Auxiliary Links Displayed: (all checked)
+Name Server Set-up
+Server Key: 0
+Name Server Url: http://localhost/yioop/
+Caral Robot Name: TestBot
+Robot Instance: A
+Robot Description: TestBot should be disallowed from everywhere because
+the installer of Yioop did not customize this to his system.
+Please block this ip.
+</pre>
+</li>
+<li>Go to Manage Machines. Add a single machine under Add Machine using the
+settings:
+<pre>
+Machine Name: Local
+Machine Url: http://localhost/yioop/
+Is Mirror: (uncheck)
+Has Queue Server: (check)
+Number of Fetchers 1
+Submit
+</pre>
+</li>
+<li>Under Machine Information turn the Queue Server and Fetcher On.</li>
+<li>Go to Manage Crawls. Click on the options to set up where you want to crawl.
+Type in a name for the crawl and click start crawl.</li>
+<li>Let it crawl for a while, till you see the Total URls Seen &gt; 1.</li>
+<li>Then click Stop Crawl and wait for the crawl to appear in the previous
+crawls list. Set it as the default crawl. You should be
+able to search using this index.
+</li>
+</ol>
+
+<h2 id="ubuntu">Ubuntu Linux</h2>
 <ol>
 <li>Get PHP and Apache set-up by running the following commands as needed
 (you might have already done some):
@@ -225,16 +369,17 @@ sudo apt-get install php5-gd
 </li>
 <li>After this sequence, the files /etc/apache2/mods-enabled/php5.conf
 and /etc/apache2/mods-enabled/php5.load should exist and link
-to the corresponding files in /etc/apache2/mods-available. The configuration
+to the corresponding files in /etc/apache2/mods-available. The configuration
 files for php are /etc/php5/apache2/php.ini (for the apache module)
-and /etc/php5/cli/php.ini (for the command-line interpreter).
-You want to make changes to both configurations. Using your favorite
+and /etc/php5/cli/php.ini (for the command-line interpreter).
+You want to make changes to both configurations. Using your favorite
 texteditor, vi, nano, gedit, etc., modify the line:
 <pre>
 post_max_size = 8M
 to
 post_max_size = 32M
 </pre>
+This change is not strictly necessary, but will improve performance.
 </li>
 <li>Looking in the folders /etc/php5/apache2/conf.d and
 /etc/php5/cli/conf.d you can see which extensions are being loaded
@@ -249,15 +394,15 @@ sudo apachectl start
 <li>We are going to configure Yioop so that fetchers and queue_servers
 can be started from the GUI interface. On a Linux machine, Yioop makes
 use of the Unix "at" command. Under Ubuntu, "at" will typically be enabled,
-however, you might need to give your web server access to schedule
+however, you might need to give your web server access to schedule
 "at" jobs. To do this, check that the web server user (www-data)
 is not in the file /etc/at.deny .</li>
-<li>The DocumentRoot for web sites (virtual hosts) served by an Ubuntu Linux
+<li>The DocumentRoot for web sites (virtual hosts) served by an Ubuntu Linux
 machine is typically specified by files in /etc/apache2/sites-enabled.
 In this example, it was given in a file 000-default and specified to
 be /var/www/.</li>
-<li><a href="http://www.seekquarry.com/viewgit/?a=summary&p=yioop"
->Download Yioop</a>, unpack it into /var/www and use
+<li><a href="http://www.seekquarry.com/viewgit/?a=summary&amp;p=yioop"
+>Download Yioop</a>, unpack it into /var/www and use
 mv to rename the Yioop folder to yioop.</li>
 <li>Make a folder for your crawl data:
 <pre>
@@ -265,7 +410,7 @@ sudo mkdir /var/www/yioop_data
 sudo chmod 777 /var/www/yioop_data
 </pre>
 </li>
-<li>In a browser, go to the page http://localhost/yioop/ .
+<li>In a browser, go to the page http://localhost/yioop/ .
 You should see a configure screen
 where you can enter /var/www/yioop_data for the Work Directory. It
 will ask you to re-login. Use the login: root and no password.
@@ -303,112 +448,104 @@ Submit
 <li>Go to Manage Crawls. Click on the options to set up where you want to crawl.
 Type in a name for the crawl and click start crawl.</li>
 <li>Let it crawl for a while, till you see the Total URls Seen &gt; 1.</li>
-<li>Then click Stop Crawl and wait for the crawl to appear in the previous
+<li>Then click Stop Crawl and wait for the crawl to appear in the previous
 crawls list. Set it as the default crawl. You should be
 able to search using this index.
 </li>
 </ol>
-<h2 id="osx">Mac OSX / Mac OSX Server</h2>
-<p>The instructions given here are for OSX Mountain Lion, Apple changes
-the positions with which files can be found slightly between
-versions, so you might have to do a little exploration to find things
-for earlier OSX versions. </p>
+
+<h2 id="centos">Centos Linux</h2>
+<p>These instructions were tested running
+<a href="http://virtualboxes.org/images/centos/">Centos 6.3 image</a> in
+<a href="https://www.virtualbox.org/">VirtualBox</a>. The keyboard settings
+for the particular image on the VirtualBox site are Italian, so you will
+have to tweak them to get an American keyboard or the keyboard you are most
+comfortable with. To get started log in as user centos, and then
+launched a terminal window and su root.
+</p>
 <ol>
-<li>Turn on Apache with PHP enabled.
-<ul>
-<li><b>Not OSX Server:</b> Traditionally, (pre-Mountain Lion) OSX, one
-could go to Control Panel =&gt; Sharing, and turn on Web Sharing to
-get the web server running. This option was removed in Mountain Lion, however,
-from the command line (Terminal), one can type:
-<pre>
-sudo apachectl start
-</pre>
-to start the Web server, and similarly,
+<li>The image doesn't have Apache installed or the nano editor.
+These can be installed with the commands:<br />
 <pre>
-sudo apachectl stop
+yum install httpd
+yum install nano
 </pre>
-to stop it. Alternatively, to make it so the WebServer starts each time the
-machine is turned on one can type:
+If you didn't su root, then you will need to put sudo before all commands
+in this guide, and you will have to make sure the user you are running
+under is in the list of sudoers.
+</li>
+<li>
+Apache's configuration files are in the /etc/httpd directory. To
+get of the default web landing page, we switch into the conf.d subfolder
+and disable welcome.conf. To do this, first type the commands:
 <pre>
-sudo defaults write /System/Library/LaunchDaemons/org.apache.httpd Disabled -bool false
+cd /etc/httpd/conf.d
+nano welcome.conf
 </pre>
-By default, document root is
-/Library/WebServer/Documents. The configuration files for Apache in
-this setting are located in /etc/apache2. If you want to tweak document
-root or other apache settings, look in the folder /etc/apache2/other and
-edit appropriate files such as httpd-vhosts.conf or httpd-ssl.conf .
-Before turning on Web Sharing / the
-web server, you would want to edit the file /etc/apache/httpd.conf, replace
+Then using the editor put #'s at the start of each line and save the result.
+</li>
+<li>Yioop needs to be able to issue shell commands to start and stop
+machines. In particular, it uses the "at daemon" (atd) to do this.
+The web server on Centos runs as user Apache and by default its shell is
+specified as noshell. Also, Centos makes use of SELinux and the domain
+under which Apache runs prevents it from issuing at commands as well.
+You probably want to use audit2allow and semanage to configure exactly
+the settings you need to get Yioop! to run. For the purposes of expediency
+however one can type:
 <pre>
-#LoadModule php5_module libexec/apache2/libphp5.so
+usermod -s /bin/sh apache
+chcon -t unconfined_exec_t /usr/sbin/httpd
 </pre>
-with
+Please do not use the above in a production environment!
+</li>
+<li>Next we install git, php, and the various php extensions we need:
 <pre>
-LoadModule php5_module libexec/apache2/libphp5.so
+yum install git
+yum install php
+yum install php-mbstring
+yum install php-sqlite3
+yum install gd
+yum install php-gd
 </pre>
 </li>
-<li><b>OSX Server:</b> Pre-mountain lion, OSX Server used /etc/apache2
-to store its configuration files, since Mountain Lion these files are in
-/Library/Server/Web/Config/apache2 . Within this folder, the sites folder
-holds Apache directives for specific virtual hosts. OSX Server comes
-with Server.app which will actively fight any direct tweaking to configuration
-files. From Server.app to get the web server running click on Websites.
-Make sure "Enable PHP web applications" is checked and Websites is On.
-The default web site is /Library/Server/Web/Data/Sites/Default , you
-probably want to click on + under websites and specify document root to
-be as you like.
-</li>
-</ul>
-</li>
-<li>
-Modify the php.ini file, this is likely in the file /private/etc/php.ini.
-You want to change
+<li>The default Apache DocumentRoot under Centos is /var/www/html. We will
+install Yioop in a folder /var/www/html/yioop. This could be accessed
+by pointing a browser at http://127.0.0.1/yioop/
+To download Yioop to /var/www/html/yioop and to create a work directory,
+we run the commands:
 <pre>
-post_max_size = 8M
-to
-post_max_size = 32M
+cd /var/www/html
+git clone http://seekquarry.com/git/yioop.git yioop
+mkdir yioop_data
+chmod 777 yioop_data
 </pre>
-Restart the web server after making this change.
 </li>
 <li>
-We are going to configure Yioop so that fetchers and queue_servers
-can be started from the GUI interface. On an OSX machine, Yioop makes
-use of the Unix "at" command. On OSX to enable  "at" jobs, you might need to
-type:
+If the web server and atd are not running then start them:
 <pre>
-sudo launchctl load -w /System/Library/LaunchDaemons/com.apple.atrun.plist
+service httpd start
+service atd start
 </pre>
 </li>
-<li>For the remainder of this guide, we assume document root for
-the web server is: /Library/WebServer/Documents.
-<a href="http://www.seekquarry.com/viewgit/?a=summary&p=yioop"
->Download Yioop</a>, unpack it into /Library/WebServer/Documents and rename
-the Yioop folder to yioop.</li>
-<li>Make a folder for your crawl data:
+<li>Tell Yioop where its work directory is:
 <pre>
-sudo mkdir /Library/WebServer/Documents/yioop_data
-sudo chmod 777 /Library/WebServer/Documents/yioop_data
-</pre>
-You probably want to make sure Spotlight (Mac's built-in file and folder
-indexer) doesn't index this folder -- especially during a crawl -- or you
-system might really slow down. To prevent this open Control Panel, choose
-Spotlight, select the Privacy tab, and add the above folder to the list
-of folder Spotlight shouldn't index. If you are storing crawls on an
-external drive, you might want to make sure that drive gets automounted
-without a login, in the event of a power failure that exceeds your backup power
-supply time. To do this you can write the preference:
-<pre>
-sudo defaults write /Library/Preferences/SystemConfiguration/autodiskmount AutomountDisksWithoutUserLogin -bool true
+cd /var/www/html/yioop/configs
+php configure_tool.php
+
+select option (1) Create/Set Work Directory
+enter /var/www/html/yioop_data
+then select option (1) to confirm the change.
+exit program
 </pre>
 </li>
-<li>In a browser, go to the page http://localhost/yioop/ .
+<li>In a browser, go to the page http://localhost/yioop/ .
 You should see a configure screen
-where you can enter /Library/WebServer/Documents/yioop_data for the
-Work Directory. It will ask you to re-login. Use the login: root and no password.
+where you can enter /var/www/yioop_data for the Work Directory. It
+will ask you to re-login. Use the login: root and no password.
 Now go to Yioop =&gt;
 Configure and input the following settings:
 <pre>
-Search Engine Work Directory: /Library/WebServer/Documents/yioop_data
+Search Engine Work Directory: /var/www/yioop_data
 Default Language: English
 Debug Display: (all checked)
 Search access: (all checked)
@@ -439,7 +576,7 @@ Submit
 <li>Go to Manage Crawls. Click on the options to set up where you want to crawl.
 Type in a name for the crawl and click start crawl.</li>
 <li>Let it crawl for a while, till you see the Total URls Seen &gt; 1.</li>
-<li>Then click Stop Crawl and wait for the crawl to appear in the previous
+<li>Then click Stop Crawl and wait for the crawl to appear in the previous
 crawls list. Set it as the default crawl. You should be
 able to search using this index.
 </li>
@@ -451,17 +588,17 @@ Generally, it is not practical to do your crawling in a cPanel hosted website.
 However, cPanel works perfectly fine for hosting the results of a crawl you did
 elsewhere. Here it is briefly described how to do this. In capacity planning,
 your installation, as a rule of thumb, you should
-expect your index to be of comparable size (number of bytes) to the sum of
+expect your index to be of comparable size (number of bytes) to the sum of
 the sizes of the pages you downloaded.
 </p>
 <ol>
-<li>Download <a href="http://www.seekquarry.com/viewgit/?a=summary&p=yioop"
->Yioop!</a> (you should choose some version &gt; 0.88 or latest)
+<li>Download <a href="http://www.seekquarry.com/viewgit/?a=summary&amp;p=yioop"
+>Yioop</a> (you should choose some version &gt; 0.88 or latest)
 to your local machine.</li>
-<li>In cPanel go to File Manager and navigate to the place you want on your
-server to serve Yioop from. Click upload and choose your zip file so as to
+<li>In cPanel go to File Manager and navigate to the place you want on your
+server to serve Yioop from. Click upload and choose your zip file so as to
 upload it to that location.</li>
-<li>Select the uploaded file and click extract to extract the zip file to a
+<li>Select the uploaded file and click extract to extract the zip file to a
 folder. Reload the page. Rename the extracted folder, if necessary.
 </li>
 <li>For the rest of these instructions, let's assume it was mysite
@@ -486,66 +623,66 @@ http://mysite.my/yioop/
 </pre>
 you should see a place to enter a work directory path.
 </li>
-<li>The work directory must be an absolute path. In the cPanel FileManager
-next at the top
-of the directory tree in the left hand side of the screen it lists the file
+<li>The work directory must be an absolute path. In the cPanel FileManager
+next at the top
+of the directory tree in the left hand side of the screen it lists the file
 path such as
 <pre>
 /public_html/mysite.my/yioop/configs
 </pre>
-(if we still happened to be in the configs directory).
-You want to make this a full path. Typically, this means tacking on
+(if we still happened to be in the configs directory).
+You want to make this a full path. Typically, this means tacking on
 /home/username (what you log in with) to the path so far.
 To keep things simple set the work directory to be:
 <pre>
 /home/username/public_html/mysite.my/yioop_data
 </pre>
-Here username should be your user name. After filling in this as the
-Work Directoryclick Load or Create. You will see it briefly display a
-complete profile page then log you out saying you must login with username
-root password blank Re-Login.
+Here username should be your user name. After filling in this as the
+Work Directoryclick Load or Create. You will see it briefly display a
+complete profile page then log you out saying you must login with username
+root password blank Re-Login.
 </li>
 <li>Go to Manage account and give yourself a better login and password.</li>
-<li><p>Go to configure. Many cPanel installation still use PHP 5.2 so you might
+<li><p>Go to configure. Many cPanel installation still use PHP 5.2 so you might
 see:
 <pre>
-The following required items were missing:
+The following required items were missing:
 PHP Version 5.3 or Newer
 </pre>
 This means you won't be able to crawl from within cPanel, but you will still be
-able to serve search results. To do this, perform a crawl elsewhere,
+able to serve search results. To do this, perform a crawl elsewhere,
 for instance on your laptop.</li>
-<li>After performing a crawl, go to Manage Crawls
+<li>After performing a crawl, go to Manage Crawls
 on the machine where you preformed the crawl.
-Look under Previous Crawls and locate the crawl you want to upload.
+Look under Previous Crawls and locate the crawl you want to upload.
 Note its timestamp.</li>
-<li>Go to THIS_MACHINES_WORK_DIRECTORY/cache . Locate the folder
+<li>Go to THIS_MACHINES_WORK_DIRECTORY/cache . Locate the folder
 IndexDatatimestamp. where timestamp is the timestamp of the crawl you want.
 ZIP this folder.</li>
-<li>In FileManager, under cPanel on the machine you want to host your crawl,
+<li>In FileManager, under cPanel on the machine you want to host your crawl,
 navigate to
 <pre>
 yioop_data/cache.
 </pre>
 Upload the ZIP and extract it.</li>
 <li>Go to Manage Crawls on this instance of Yioop,
-locate this crawl under Previous Crawls and set it as the default crawl.
+locate this crawl under Previous Crawls and set it as the default crawl.
 You should now be able to search and get results from the crawl.
 </li>
 </ol>
 <p>
-You will probably want to uncheck Cache in the Configure activity as in this
-hosted setting it is somewhat hard to get the cache page feature of Yioop! to
+You will probably want to uncheck Cache in the Configure activity as in this
+hosted setting it is somewhat hard to get the cache page feature of Yioop to
 work.
 </p>

 <h2 id="multiple">System with Multiple Queue Servers</h2>
 <p>
-This section assumes you have already successfully installed and performed
-crawls with Yioop! in the single queue_server setting and have succeeded to use
-the Manage Machines to start and stop a queue_server and fetcher. If not, you
+This section assumes you have already successfully installed and performed
+crawls with Yioop in the single queue_server setting and have succeeded to use
+the Manage Machines to start and stop a queue_server and fetcher. If not, you
 should consult one of the installation guides above or the general
-<a href="http://localhost/git/seek_quarry/?c=main&p=documentation">Yioop
+<a href="?c=main&amp;p=documentation">Yioop
 Documentation</a>.
 </p>
 <p>
@@ -553,36 +690,36 @@ Before we begin, what are the advantages in using more than one queue_server?
 </p>
 <ol>
 <li>If the queue_servers are running on different processors then they can each
-be indexing part of the crawl data independently and so this can speed up
+be indexing part of the crawl data independently and so this can speed up
 indexing</li>
-<li>After the crawl is done, the index will typically exist on multiple
+<li>After the crawl is done, the index will typically exist on multiple
 machines and each needs to search a smaller amount of data before sending it to
 the name server for final merging. So queries can be faster.</li>
 </ol>
 <p>
-For the purposes of this post we will consider the case of two queue_servers,
+For the purposes of this post we will consider the case of two queue_servers,
 the same idea works for more. To keep things especially simple, we have both of
  these queue_servers on the same laptop. Advantages
-(1) and (2) will likely not apply in this case, but we are describing this
-for testing purposes -- you can take the same idea and have the queue servers
+(1) and (2) will likely not apply in this case, but we are describing this
+for testing purposes -- you can take the same idea and have the queue servers
 on different machines after going through this tutorial.
 </p>

 <ol>
-<li>Download and install yioop as you would in the single queue_server case.
+<li>Download and install yioop as you would in the single queue_server case.
 But do this twice. For example, on your machine, under document root you
-might have two subfolders
+might have two subfolders
 <pre>
-git/yioop1
+git/yioop1
 </pre>
-and
+and
 <pre>
 git/yioop2
 </pre>
-each with a complete copy of yioop.
-We will use the copy git/yioop1 as an instance of Yioop with both a name_server
+each with a complete copy of yioop.
+We will use the copy git/yioop1 as an instance of Yioop with both a name_server
 and a queue_server; the git/yioop2 will be an instance with just a
-queue_server.
+queue_server.
 </li>
 <li>
 On the Configure element of the git/yioop1 instance, set the work directory
@@ -595,16 +732,16 @@ for the git/yioop2 instance we set it to be
 /Applications/XAMPP/xamppfiles/htdocs/crawls2
 </pre>
 i.e., the work directories of these two instances should be different!
-For each crawl in the multiple queue_server setting, each instance will
-have a copy of those documents it is responsible for. So if we did a crawl with
+For each crawl in the multiple queue_server setting, each instance will
+have a copy of those documents it is responsible for. So if we did a crawl with
 timestamp 10, each instance would have a WORK_DIR/cache/IndexData10
 folder and these folders would be disjoint in its contents from any other
 instance.
 </li>
 <li>
 Continuing down on the Configure element for each instance, make sure under the
-Search Access fieldset Web, RSS, and API are checked.</li>
-<li>Next make sure the name server and server key are the same for both
+Search Access fieldset Web, RSS, and API are checked.</li>
+<li>Next make sure the name server and server key are the same for both
 instances. i.e., In the Name Server Set-up fieldset, one might set:
 <pre>
 Server Key:123
@@ -618,12 +755,12 @@ TestBotFeelFreeToBan
 <pre>
 but we want the robot instance to be different, say 1 and 2.
 </li>
-<li>Go to the Manage Machine element for git/yioop1, which is the name server.
+<li>Go to the Manage Machine element for git/yioop1, which is the name server.
 Only the name server needs to manage machines,
-so we won't do this for git/yioop2 (or for any other queue servers
+so we won't do this for git/yioop2 (or for any other queue servers
 if we had them).</li>
 <li>Add machines for each yioop instance we want to manage with the name server.
-In this particular case, fill out and submit the Add Machine form twice,
+In this particular case, fill out and submit the Add Machine form twice,
 the first time with:
 <pre>
 Machine Name:Local1
@@ -642,41 +779,41 @@ Num Fetchers: 1
 </pre>
 The Machine Name should be different for each Yioop instance, but can otherwise
 be whatever you want. Is Mirror controls whether this is a replica of some other
-node -- I'll save that for a different install guide at some point. If we
-wanted to run more fetchers  we could have chosen a bigger number for
+node -- I'll save that for a different install guide at some point. If we
+wanted to run more fetchers  we could have chosen a bigger number for
 Num Fetchers (fetchers are the processes that download web pages).
 </li>
 <li>
-After the above steps, there should be two machines listed under
-Machine Information.  Click the On button on the queue server and the
-fetcher of both of them. They  should turn green. If you click the log link
-you should start seeing new  messages (it refreshes once every 30 secs) after
+After the above steps, there should be two machines listed under
+Machine Information.  Click the On button on the queue server and the
+fetcher of both of them. They  should turn green. If you click the log link
+you should start seeing new  messages (it refreshes once every 30 secs) after
 at most a minute or so.
 </li>
 <li>
-At this point you are ready to crawl in the multiple queue server setting. You
-can use Manage Crawl to set-up, start and stop a crawl exactly as in the single
-queue_server setting.
+At this point you are ready to crawl in the multiple queue server setting. You
+can use Manage Crawl to set-up, start and stop a crawl exactly as in the single
+queue_server setting.
 </li>
 <li>
 Perform a crawl and set it as the default index. You can
-then turn off all the queue servers and fetchers in Manage Machines, if you
+then turn off all the queue servers and fetchers in Manage Machines, if you
 like.</li>
 <li>
 If you type a query into the search bar of the name server (git/yioop1),
 you should be getting merged results from both queue servers. To check
-if this is working... Under configure on the name server (git/yioop1) make sure
+if this is working... Under configure on the name server (git/yioop1) make sure
 Query Info is checked and that
-Use Memcache and Use FileCache are not checked -- the latter two are for
-testing, we can check them later when we know things are working. When you
-perform a query now, at the bottom of the page you should see a horizontal
+Use Memcache and Use FileCache are not checked -- the latter two are for
+testing, we can check them later when we know things are working. When you
+perform a query now, at the bottom of the page you should see a horizontal
 rule followed by Query Statistics followed
 by all the queries performed in calculating results. One of these should be
 PHRASE QUERY. Underneath it you should see Lookup Offset Times and beneath this
 Machine Subtimes: ID_0 and ID_1. If these appear you know its working.
 </li>
 </ol>
-<p>When a query is typed into the name server it tacks no:network onto it
+<p>When a query is typed into the name server it tacks no:network onto it
 and asks it of all the queue servers, then merges the results.
 So if you type "hello" as the search, i.e., if you go to the url
 <pre>
@@ -684,10 +821,11 @@ http://localhost/git/yioop1/?q=hello
 </pre>
 the git/yioop1 script will make in parallel the curl requests
 <pre>
-http://localhost/git/yioop1/?q=hello&ne ... alse&raw=1 (raw=1 means no grouping)
-http://localhost/git/yioop2/?q=hello&ne ... alse&raw=1
+http://localhost/git/yioop1/?q=hello&amp;ne ... alse&amp;raw=1
+    (raw=1 means no grouping)
+http://localhost/git/yioop2/?q=hello&amp;ne ... alse&amp;raw=1
 </pre>
 get the results back and merge them and finally return to the user the result.
-The network=false tells http://localhost/git/yioop1/ to actually do the query
+The network=false tells http://localhost/git/yioop1/ to actually do the query
 lookup rather than make a network request.
 </p>
diff --git a/en-US/pages/resources.thtml b/en-US/pages/resources.thtml
index 26f3674..31d054c 100755
--- a/en-US/pages/resources.thtml
+++ b/en-US/pages/resources.thtml
@@ -1,8 +1,11 @@
 <h1>Resources</h1>
 <ul>
-<li><a href="/phpBB/">Discussion Boards</a></li>
+<li><a href="http://www.seekquarry.com/phpBB/">Discussion Boards</a></li>
 <li><a href="?c=main&amp;p=install">Install Guides</a></li>
-<li><a href="/mantis/">Issue Tracking</a></li>
-<li><a href="/yioop-docs/">PHPDocumentor docs for Yioop source code</a></li>
-<li><a href="/viewgit/">View Git of Yioop repository</a></li>
+<li><a href="?c=main&amp;p=coding">Coding Guidelines</a></li>
+<li><a href="http://www.seekquarry.com/mantis/">Issue Tracking</a></li>
+<li><a href="http://www.seekquarry.com/yioop-docs/"
+    >PHPDocumentor docs for Yioop source code</a></li>
+<li><a href="http://www.seekquarry.com/viewgit/"
+    >View Git of Yioop repository</a></li>
 </ul>

ViewGit