Seekquarry static pages for Version 0.72

Chris Pollett [2011-08-15 17:Aug:th]
Seekquarry static pages for Version 0.72
Filename
en-US/pages/about.thtml
en-US/pages/documentation.thtml
en-US/pages/downloads.thtml
diff --git a/en-US/pages/about.thtml b/en-US/pages/about.thtml
index f486d48..db7e43b 100755
--- a/en-US/pages/about.thtml
+++ b/en-US/pages/about.thtml
@@ -28,8 +28,9 @@ combined the two to get Yioop!</p>
 <h1>Additional Credits</h1>
 <p>
 Several people helped
-with localization: Mary Pollett, Thanh Bui, Youn Kim, Sugi Widjaja,
-Chao-Hsin Shih, Sujata Dongre, and Jonathan Ben-David. Thanks to
+with localization: Mary Pollett, Jonathan Ben-David,
+Thanh Bui, Sujata Dongre, Youn Kim, Chao-Hsin Shih,
+and Sugi Widjaja. Thanks to
 Ravi Dhillon for finding and helping with the fixes for Issue 15
 and Commit 632e46. Several of my master's students have done projects
 related to Yioop!: Amith Chandranna, Priya Gangaraju, and Vijaya Pamidi.
@@ -41,5 +42,7 @@ Vijaya developed a Firefox web traffic extension for Yioop!
 Her code is also obtainable from <a href="http://www.cs.sjsu.edu/faculty/<?php
 ?>pollett/masters/Semesters/Fall10/vijaya/index.shtml">Vijaya Pamidi's
 master's pages</a>. Priya's code served as the
-basis for the plugin feature currently in Yioop!
+basis for the plugin feature currently in Yioop! The following other
+students have  created text processors for Yioop!: Nakul Natu (pptx),
+Vijeth Patil (epub), and Tarun Pepira (xslx).
 </p>
diff --git a/en-US/pages/documentation.thtml b/en-US/pages/documentation.thtml
index 382632e..89257a4 100755
--- a/en-US/pages/documentation.thtml
+++ b/en-US/pages/documentation.thtml
@@ -1,5 +1,5 @@
 <div class="docs">
-<h1>Yioop! Documentation v 0.70</h1>
+<h1>Yioop! Documentation v 0.72</h1>
     <h2 id='toc'>Table of Contents</h2>
     <ul>
         <li><a href="#intro">Introduction</a></li>
@@ -11,8 +11,10 @@
         <li><a href="#userroles">Managing Users and Roles</a></li>
         <li><a href="#crawls">Managing Crawls</a></li>
         <li><a href="#mixes">Mixing Crawl Indexes</a></li>
+        <li><a href="#filter">Search Filter</a></li>
         <li><a href="#localizing">Localizing Yioop! to a New Language</a></li>
-        <li><a href="#hacking">Customizing Yioop!</a></li>
+        <li><a href="#customizing">Customizing Yioop!</a></li>
+        <li><a href="#commandline">Yioop! Command-line Tools</a></li>
         <li><a href="#references">References</a></li>
     </ul>

@@ -93,7 +95,7 @@
     operation is reasonably easy to distribute to many machines. Computing how
     relevant a word is to a document is another
     task that benefit from multi-round, distributed computation. When a document
-    is processed by indexers on multiple machine, words are extracted and a
+    is processed by indexers on multiple machines, words are extracted and a
     stemming algorithm such as [<a href="#P1980">P1980</a>] might be employed
     (a stemmer would extract the word jump from words such as jumps, jumping,
     etc). Next a statistic such as BM25F [<a href="#ZCTSR2004">ZCTSR2004</a>]
@@ -160,7 +162,11 @@
     the desired location under the web server's document folder, each
     fetcher is configured to know who the queue_server is, and finally,
     the fetcher's programs are run on each fetcher machine and the queue_server
-    is run of the coordinating machine.
+    is run of the coordinating machine. Since a multi-million page crawl might
+    take several days Yioop! supports the ability to dynamically change its
+    crawl parameters as a crawl is going on. This allows a user on request
+    from a web admin to disallow Yioop! from continuing to crawl a site without
+    having to stop the overall crawl.
     </p>
     <p>Despite its simpler model, Yioop! does a number of things to improve the
     quality of its search results. For each link extracted from a page,
@@ -244,7 +250,7 @@
     <a href="http://www.wikipedia.org/">Wikipedia</a> for archiving MediaWiki
     wikis. Wikipedia offers <a
     href="http://en.wikipedia.org/wiki/Wikipedia:Database_download">creative
-    common licenses downloads</a>
+    common-licensed downloads</a>
     of their site in this format. The <a href="http://www.dmoz.org/">Open
     Directory Project</a> makes available its <a
     href="http://www.dmoz.org/rdf.html">ODP data set</a> in an RDF-like format
@@ -265,7 +271,7 @@
     this introduction:
     </p>
     <ul>
-    <li>Yioop! is an open source distributed crawler and search engine
+    <li>Yioop! is an open-source, distributed crawler and search engine
     written in PHP.</li>
     <li>It is capable of crawling and indexing small sites to sites or
     collections of sites containing millions of documents.</li>
@@ -273,10 +279,16 @@
     downloads of pages.</li>
     <li>It has a web interface to select seed sites for crawls and set what
     sites crawls should not crawl.</li>
-    <li>It obeys robots.txt file including the Crawl-delay directive.</li>
+    <li>It obeys robots.txt file including the Crawl-delay directive.
+    It supports the robots meta tag.</li>
     <li>It supports open web crawls, but through its web interface one can
     configure it also to crawl only specifics site, domains, or collections
     of sites and domains. </li>
+    <li>Yioop! supports dynamically changing the allowed and disallowed
+    sites while a crawl is in progress.</li>
+    <li>It supports the indexing of many different filetypes including:
+    HTML, BMP, DOC, ePub, GIF, JPG, PDF, PPT, PPTX, PNG, RSS, RTF, sitemaps,
+    SVG, XLSX, and XML.</li>
     <li>Crawling, indexing, and serving search results can be done on a
     single machine or distributed across several machines.</li>
     <li>It uses a simplified distributed model that is straightforward to
@@ -300,10 +312,11 @@
     <li>Yioop! uses a web archive file format which makes it easy to
     copy crawl results amongst different machines. It has a command-line
     tool for inspecting these archives if they need to examined
-    in a non-search setting.</li>
+    in a non-web setting. It also supports command line search querying
+    of these archives.</li>
     <li>Using web archives, crawls can be mirrored amongst several machines
     to speed-up serving search results. This can be further sped-up
-    by using memcache.</li>
+    by using memcache or filecache.</li>
     <li>A given Yioop! installation might have several saved crawls and
     it is very quick to switch between any of them and immediately start
     doing text searches.</li>
@@ -315,15 +328,16 @@

     <h2 id="required">Requirements</h2>
     <p>The Yioop! search engine requires: (1) a web server, (2) PHP 5.3 or
-    better, (3) Curl libraries for downloading web pages. To be a little more
-    specific Yioop! has been tested with Apache 2.2;
-    however, it should work with other webservers. For PHP, you need a build of
-    PHP that incorporates multi-byte string (mb_ prefixed) functions,
-    Curl, Sqlite, the GD graphics library and the
-    command-line interface. If you are using Mac OSX Snow Leopard,
-    the version of Apache2 and PHP that come with it suffice. For Windows,
-    Mac, and Linux another easy way to get the required software is to
-    download a Apache/PHP/MySql suite such as
+    better (Yioop! used only to serve search results from a pre-built index
+    has been tested to work in PHP 5.2), (3) Curl libraries for downloading
+    web pages. To be a little more specific Yioop! has been tested with
+    Apache 2.2; however, it should work with other webservers. For PHP,
+    you need a build of PHP that incorporates multi-byte string (mb_ prefixed)
+    functions, Curl, Sqlite (or at least PDO with Sqlite driver),
+    the GD graphics library and the command-line interface. If you are using
+    Mac OSX Snow Leopard or Lion, the version of Apache2 and PHP that come
+    with it suffice. For Windows, Mac, and Linux another easy way to get the
+    required software is to download a Apache/PHP/MySql suite such as
     <a href="http://www.apachefriends.org/en/xampp.html">XAMPP</a>. On Windows
     machines, find the the php.ini file under the php folder in your Xampp
     folder and change the line:</p>
@@ -356,11 +370,11 @@ extension=php_curl.dll
     <h3>Memory Requirements</h3>
     <p>In addition, to the prerequisite software listed above, Yioop! also
     has certain memory requirements. By default bin/queue_server.php
-    requires 1200MB, bin/fetcher.php requires 800MB, and index.php requires
+    requires 1000MB, bin/fetcher.php requires 750MB, and index.php requires
     200MB. These  values are set near the tops of each of these files in turn
     with a line like:</p>
 <pre>
-ini_set("memory_limit","800M");
+ini_set("memory_limit","1000M");
 </pre>
     <p>
     If you want to reduce these memory requirements, it is advisable to also
@@ -394,7 +408,7 @@ your machine. In the case above, the web server needs permissions on the
 file configs/config.php to write in the value of the directory you choose in the
 form for the Work Directory. Another common message asks you to make sure the
 web server has permissions on the place where this auxiliary
-folder needs to be created. When filling out the form othis page, on both
+folder needs to be created. When filling out the form of this page, on both
 *nix-like, and Windows machines, you should use forward slashes for the folder
 location. For example,
 </p>
@@ -423,7 +437,7 @@ checked, will cause statistics about the time, etc. of database queries
 to be displayed at the bottom of each web page. The last checkbox,
 Test Info, says whether or not to display automated tests of some of the
 systems library classes if the browser is navigated to
-http://YIOOP_INSTALLATION/tests/. Again, none of these debug settings should
+http://YIOOP_INSTALLATION/tests/. None of these debug settings should
 be checked in a production environment.
 </p>
 <p>The <b>Database Set-up</b> fieldset is used to specify what database management
@@ -476,19 +490,24 @@ search results from crawls that you have already done, then this
 fieldset can be filled in however you want.</li>
 <li>If you are doing crawling on only one machine, you can put
 http://localhost/path_to_yioop/ or
-http://127.0.0.1/path_to_yioop/, where you appropriate modify
+http://127.0.0.1/path_to_yioop/, where you appropriately modify
 "path_to_yioop".</li>
-<li>Otherwise, if you are doing a crawl on multiple machines, put
-the url to the machine that will act as the queue_server.</li>
+<li>Otherwise, if you are doing a crawl on multiple machines, use
+the url of Yioop! on the machine that will act as the queue_server.</li>
 </ol>
 <p>In communicating between the fetcher and the server, Yioop! uses
 curl. Curl can be particular about redirects in the case where posted
 data is involved. i.e., if a redirect happens, it does not send posted
 data to the redirected site. For this reason, Yioop! insists on a trailing
 slash on your queue server url. Beneath the Queue Server Url
-field, is a Memcached checkbox. Checking this allows you to specify
-memcache servers that, if specified, will be used to cache in memory search
-query results as well as index pages that have been accessed.</p>
+field, is a Memcached checkbox and a Filecache checkbox. Only one of these
+can be checked at a time. Checking the Memcached checkbox allows you to specify
+memcached servers that, if specified, will be used to cache in memory search
+query results as well as index pages that have been accessed. Checking the
+Filecache box, tells Yioop! to cache search query results in temporary files.
+Memcached probably gives a better performance boost than Filecaching, but
+not all hosting environments have Memcached available.
+</p>
 <p>
 The last fieldset is the <b>Crawl Robot Set-up</b> fieldset. This is used
 to provide websites that you crawl with information about who is crawling them.
@@ -530,36 +549,37 @@ given in the sections starting from the The Yioop! User Interface section.
 The files fetcher.php and queue_server.php are only connected with crawling
 the web. If one already has a stored crawl of the web, then you no longer
 need to run or use these programs. For instance, you might obtain a crawl of
-the web on your home machine and upload the crawl to your ISP hosting
-your website with an instance of Yioop! running on it. This website could
+the web on your home machine and upload the crawl to a
+an instance of Yioop! on the ISP hosting your website. This website could
 serve search results without making use of either fetcher.php or
 queue_server.php. To perform a web crawl you need to use both
 of these programs however as well as the Yioop! web site. This is explained in
 detail in the section Managing Crawls.
 </p>
-    <p>The Yioop! folder itself consists of several files and sub-folders.
+<p>The Yioop! folder itself consists of several files and sub-folders.
 The file index.php as mentioned above is the main entry point into the Yioop!
 web application. yioopbar.xml is the xml file specifying how to access
 Yioop as an Open Search Plugin. favicon.ico is used to display the little
 icon in the url bar of a browser when someone browses to the Yioop! site.
-A URL to the another file bot.php is given by the Yioop! robot
+A URL to the file bot.php is given by the Yioop! robot
 as it crawls websites so that website owners can find out information
 about who is crawling their sites. Here is a rough guide to what
-the Yioop! folder's sub-folder contain:
+the Yioop! folder's various sub-folders contain:
 <dl>
-<dt>bin</dt><dd>This folder is intended to hold command line scripts
+<dt>bin</dt><dd>This folder is intended to hold command-line scripts
 which are used in conjunction with Yioop! In addition to the fetcher.php
 and queue_server.php script already mentioned, it contains arc_tool.php
-which can be used to examine the contents of WebArchiveBundle's and
-IndexArchiveBundle's from the command line.</dd>
+and query_tool.php. The former  can be used to examine the contents of
+WebArchiveBundle's and IndexArchiveBundle's from the command line; the latter
+can be used to run queries from the command-line.</dd>
 <dt>configs</dt><dd>This folder contains configuration files. You will
 probably not need to edit any of these files directly as you can set the most
 common configuration settings from with the admin panel of Yioop! The file
-config.php controls a number of parameters about how data is stored, how and how
-often the queue_server and fetchers communicate, and which file types are
-supported by Yioop! createdb.php can be used to create a bare instance of the
-Yioop! database with a root admin user having no password. This script is not
-strictly necessary as the database should be creatable via the admin panel.
+config.php controls a number of parameters about how data is stored, how,
+and how often, the queue_server and fetchers communicate, and which file types
+are supported by Yioop! createdb.php can be used to create a bare instance of
+the Yioop! database with a root admin user having no password. This script is
+not strictly necessary as the database should be creatable via the admin panel.
 The file default_crawl.ini is copied to WORK_DIRECTORY after you set this
 folder in the admin/configure panel. There it is renamed as crawl.ini and
 serves as the initial set of sites to crawl until you decide to change these.
@@ -585,7 +605,7 @@ classes for things like indexing, storing data to files, parsing urls, etc.
 lib contains six subfolders: <i>archive_bundle_iterators</i>,
 <i>compressors</i>, <i>index_bundle_iterators</i>, <i>indexing_plugins</i>,
 <i>processors</i>, and <i>stemmers</i>. The <i>archive_bundle_iterators</i>
-folder has iterator for iterating over the objects of various kinds of
+folder has iterators for iterating over the objects of various kinds of
 web archive file formats, such as arc, wiki-media, etc.
 These iterators are used to iterate over such archives during
 a recrawl. The <i>compressors</i> folder contains classes that might be used
@@ -613,7 +633,7 @@ as <a href="http://memcached.org/">memcached</a> or <a
 href="http://php.net/manual/en/book.apc.php">apc</a>. Besides the file
 configure.ini, there is a statistics.txt file which has info about what
 percentage of the id's have been translated. Finally, although not used in the
-default Yioop! system. It is possible for a given locale folder to have a
+default Yioop! system, it is possible for a given locale folder to have a
 sub-folder pages with translation of static pages used by a Yioop! installation.
 </dd>
 <dt>models</dt><dd>This folder contains the subclasses of Model used by
@@ -661,8 +681,8 @@ In the event that you upgrade your Yioop! installation you should only
 need to replace the Yioop! application folder and in the configuration
 process of Yioop! tell it where your WORK DIRECTORY is. Of course, it
 is always recommended to back up one's data before performing an upgrade.
-Within the WORK DIRECTORY, Yioop! stores three main files: profile.php,
-crawl.ini, and bot.txt. Here is a rough guide to what
+Within the WORK DIRECTORY, Yioop! stores four main files: profile.php,
+crawl.ini, bot.txt, and robot_table.txt. Here is a rough guide to what
 the WORK DIRECTORY's sub-folder contain:
     </p>
 <dl>
@@ -698,6 +718,13 @@ Finally, ScheduleData folders have data about found urls that could
 eventually be scheduled to crawl. Within each of these three kinds of folders
 there are typical many sub-folders, one for each day data arrived, and within
 these subfolders there are files containing the respective kinds of data.</dd>
+<dt>search_filters</dt><dd>This folder is used to store text files
+containing global after crawl search filter data. The global search
+filter allows a user to specify after a crawl is done that certain
+urls be removed from the search results.</dd>
+<dt>temp</dt><dd>This is used for storing temporary files that Yioop!
+creates during the crawl process. For example, temporary files used while
+making thumbnails.</dd>
 </dl>
     <p><a href="#toc">Return to table of contents</a>.</p>

@@ -716,22 +743,23 @@ clicking the Search button. A typical search results might look like:
 </p>
 <img src='resources/SearchResults.png' alt='Example Search Results'
 width="70%"/>
-<p>For each result back from the query, the title is a link to the page
-that matches the query term. This is followed by a brief summary of
-that page with the query words bolded. Then the document rank, relevancy,
-proximity, and overall scores are listed. Each of these results
-is a grouped statistic, several "micro index entry" are grouped together/summed
-to create each. So even though
+<p>Each result back from the query consists of several parts:
+First comes a title, which is a link to the page that matches the query term.
+This is followed by a brief summary of that page with the query words in bold.
+Then the document rank, relevancy,
+proximity, and overall scores are listed. Each of these numbers
+is a grouped statistic -- several "micro index entry" are grouped
+together/summed to create each. So even though
 a given "micro index entry" might have a document rank between 1 and 10 there
 sum could be a larger value. Further, the overall score is a
 generalized inner product of the scores of the "micro index entries",
 so the separated scores will not typically sum to the overall score.
 After these scores there are three links:
-Cached, Similar, and InLinks. Clicking on Cached will display Yioop's downloaded
+Cached, Similar, and Inlinks. Clicking on Cached will display Yioop's downloaded
 copy of the page in question. We will describe this in more detail
 in a moment. Clicking on Similar causes Yioop! to locate the five
 words with the highest relevancy scores for that document and then to perform
-a search on those words. Clicking on InLinks will take you to a page
+a search on those words. Clicking on Inlinks will take you to a page
 consisting of all the links that Yioop! found to the document in question.
 Finally, clicking on an IP address link returns all documents that were
 crawled from that IP address.</p>
@@ -747,10 +775,10 @@ is moved off the network on which fetcher lives, then the look up of a
 cached page might fail. On the cached page there is a "Toggle
 extracted summary" link. Clicking this will show the title, summary, and
 links that were extracted from the full page and indexed. No other terms
-on the page could be used to locate the page via a search query. This
-can be viewed as an "SEO" view of the page</p>
+on the page are used to locate the page via a search query. This
+can be viewed as an "SEO" view of the page.</p>
 <img src='resources/CacheSEO.png' alt='Example Cache SEO Results'
-width="70%"/>>
+width="70%"/>
 <p>A basic query to the Yioop! search form is typically a sequence of
 words seperated by whitespace. This will cause Yioop! to compute a
 "conjunctive query", it will look up only those documents which contain all of
@@ -930,21 +958,27 @@ activities in turn.
     is clicked the "tiers" of data in this dictionary need to be logarithmically
     merged, this process can take a couple of minutes, so after clicking stop
     do not kill the queue_server (if you were going to) until after it says
-    waiting for messages again. Finally, at the bottom of the page is a table
-    listing previously run crawls.
+    waiting for messages again.  Beneath
+    this stop button line, is a link which allows you to change the
+    crawl options of the currently active crawl. Changing the options on
+    an active crawl may take some time to fully take effect as the currently
+    processing queue of urls needs to flush.
+    At the bottom of the page is a table listing previously run crawls.
     Next to each previously run crawl are three links. The first link lets you
-    resume this crawl. This will cause Yioop! to look for unprocessed fetcher
+    resume this crawl, if this is possible, and say Closed otherwise.
+    Resume will cause Yioop! to look for unprocessed fetcher
     data regarding that crawl, and try to load that into a fresh priority
     queue of to crawl urls. If it can do this, crawling would continue.
-    The second let's you set this crawl's result as the default index. In the
-    above picture that was only one saved crawl and it is already set as the
-    default index. When someone comes to your Yioop! installation and does
-    not adjust their settings, the default index is used to compute search
-    results. The final link allows one to Delete the crawl. For both resuming a
-    crawl and deleting a crawl, it might take a little while before you see the
-    process being reflected in the display. This is because communication
-    might need to be done with the various fetchers, and because the on screen
-    display refreshes only every 20 seconds or so.
+    The second link let's you set this crawl's result as the default index.
+    In the above picture there were only two saved crawls, the second of which
+    was set as the default index. When someone comes to your Yioop!
+    installation and does not adjust their settings, the default index is
+    used to compute search results. The final link allows one to Delete the
+    crawl. For both resuming a crawl and deleting a crawl, it might take a
+    little while before you see the process being reflected in the display.
+    This is because communication might need to be done with the various
+    fetchers, and because the on screen display refreshes only every 20 seconds
+    or so.
     </p>
     <h3>Prerequisites for Crawling</h3>
     <p>Before you can start a new crawl, you need to run the queue_server.php
@@ -1032,9 +1066,16 @@ php fetcher.php stop</pre>
     respectively. It is completely possible to copy these subfolders to
     a SSD and use symlinks to them under the original crawl directory to
     enhance Yioop!'s search performance.</p>
-    <h3>Specifying Crawl Options</h3>
+    <h3>Specifying Crawl Options and Modifying Options of the Active Crawl</h3>
     <p>As we pointed out above, next to the Start Crawl button is an Options
-    link. Clicking on this link, should display the following activity:</p>
+    link. Clicking on this link, let's you set various aspect of how
+    the next crawl should be conducted. As we mentioned before, if there is
+    a currently processing crawl there will be an options link under its stop
+    button. Both of these links lead to similar pages, however, for an active
+    crawl fewer parameters can be changed. So we will only describe the first
+    link. In the case of clicking the Option
+    link next to the start button, the user should be taken to an
+    activity screen which looks like:</p>
 <img src='resources/WebCrawlOptions.png' alt='Web Crawl Options Form'/>
     <p>The Back link in the corner returns one to the previous activity.</p>
     <p>There are two kinds of crawls that can be performed by Yioop!
@@ -1236,7 +1277,27 @@ OdpRdfArchiveBundle
     button beneath this drop-down must be clicked.
     </p>
     <p><a href="#toc">Return to table of contents</a>.</p>
-
+    <h2 id='filter'>Search Filter</h2>
+    <p>The disallowed sites crawl option allows a user to specify they
+    don't want Yioop! to crawl a given web site. After a crawl is done
+    though one might be asked to removed a website from the crawl results,
+    or one might want to remove a website from the crawl results because it
+    has questionable content. A large crawl can take days to replace, to
+    make the job of doing such filtering faster while one is waiting for
+    a replacement crawl where the site has been disallowed, one can use
+    a search filter.</p>
+<img src='resources/SearchFilter.png' alt='The Search Filter form'/>
+    <p>Clicking on the Search Filter activity brings one to a screen
+    as above. Here one can specify a list of hosts which should be
+    excluded from the search results. The sites listed in the
+    Sites to Filter text area are required to be hostnames. Using
+    a filter, any web page with the same host name as one listed in
+    the Sites to Filter will not appear in the search results. So for example,
+    the filter settings in the example image above contain the line
+    http://www.cs.sjsu.edu/, so given these settings, the web page
+    http://www.cs.sjsu.edu/faculty/pollett/ would not appear in search
+    results.</p>
+    <p><a href="#toc">Return to table of contents</a>.</p>
     <h2 id='localizing'>Localizing Yioop! to a New Language</h2>
     <p>The Manage Locales activity can be used to configure Yioop
     for use with different languages and for different regions. The
@@ -1316,7 +1377,7 @@ OdpRdfArchiveBundle
     </p>
     <p><a href="#toc">Return to table of contents</a>.</p>

-    <h2 id='hacking'>Customizing Yioop!</h2>
+    <h2 id='customizing'>Customizing Yioop!</h2>
     <p>One advantage of an open-source project is that you have complete
     access to the source code. Thus, you can modify Yioop! to fit in
     with your existing project or add new feel free to add new features to
@@ -1393,67 +1454,7 @@ OdpRdfArchiveBundle
     will need to edit the models/profile_model.php file and modify
     the method migrateDatabaseIfNecessary($dbinfo) to say how
     AUTOINCREMENT columns should be handled.</p>
-    <h3>Examining the contents of WebArchiveBundle's and
-    IndexArchiveBundles's</h3>
-    <p>
-    The command-line script bin/arc_tool.php can be use to examine the
-    contents of a WebArchiveBundle or an IndexArchiveBundle. i.e., it gives
-    a print out of the web pages or summaries contained therein. It can also
-    be used to give information from the headers of these bundles. It is
-    run from the command-line with the syntaxes:
-    </p>
-    <pre>
-php arc_tool.php info bundle_name
-    //return info about documents stored in archive.
-php arc_tool.php list bundle_name start num
-    //outputs items start through num from bundle_name
-   </pre>
-   <p>For example,</p>
-   <pre>
-|chris-polletts-macbook-pro:bin:158&gt;php arc_tool.php info /Applications/XAMPP/xamppfiles/htdocs/crawls/cache/IndexData1293767731

-Bundle Name: IndexData1293767731
-Bundle Type: IndexArchiveBundle
-Description: test
-Number of generations: 1
-Number of stored links and documents: 267260
-Number of stored documents: 16491
-Crawl order was: Page Importance
-Seed sites:
-   http://www.ucanbuyart.com/
-   http://www.ucanbuyart.com/fine_art_galleries.html
-   http://www.ucanbuyart.com/indexucba.html
-Sites allowed to crawl:
-   domain:ucanbuyart.com
-   domain:ucanbuyart.net
-Sites not allowed to be crawled:
-   domain:arxiv.org
-   domain:ask.com
-Meta Words:
-   http://www.ucanbuyart.com/(.+)/(.+)/(.+)/(.+)/
-
-|chris-polletts-macbook-pro:bin:159&gt;
-|chris-polletts-macbook-pro:bin:202&gt;php arc_tool.php list /Applications/XAMPP/xamppfiles/htdocs/crawls/cache/Archive1293767731 0 3
-
-BEGIN ITEM, LENGTH:21098
-[URL]
-http://www.ucanbuyart.com/robots.txt
-[HTTP RESPONSE CODE]
-404
-[MIMETYPE]
-text/html
-[CHARACTER ENCODING]
-ASCII
-[PAGE DATA]
-&lt;!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
-  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"&gt;
-
-&lt;html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"&gt;
-
-&lt;head&gt;
-    &lt;base href="http://www.ucanbuyart.com/" /&gt;
-   &lt;/pre&gt;
-</pre>
     <h3>Writing an Indexing Plugin</h3>
     <p>An indexing plugin provides a way that an advanced end-user
     can extend the indexing capabilities of Yioop! Bundled with
@@ -1526,7 +1527,118 @@ ASCII
     </pre>
     <p>This completes the discussion of how to write an indexing plugin.</p>
     <p><a href="#toc">Return to table of contents</a>.</p>
+    <h2 id='commandline'>Yioop! Command-line Tools</h2>
+    <h3>Examining the contents of WebArchiveBundle's and
+    IndexArchiveBundles's</h3>
+    <p>
+    The command-line script bin/arc_tool.php can be use to examine the
+    contents of a WebArchiveBundle or an IndexArchiveBundle. i.e., it gives
+    a print out of the web pages or summaries contained therein. It can also
+    be used to give information from the headers of these bundles. It is
+    run from the command-line with the syntaxes:
+    </p>
+    <pre>
+php arc_tool.php info bundle_name
+    //return info about documents stored in archive.
+php arc_tool.php list bundle_name start num
+    //outputs items start through num from bundle_name
+   </pre>
+   <p>For example,</p>
+   <pre>
+|chris-polletts-macbook-pro:bin:158&gt;php arc_tool.php info /Applications/XAMPP/xamppfiles/htdocs/crawls/cache/IndexData1293767731

+Bundle Name: IndexData1293767731
+Bundle Type: IndexArchiveBundle
+Description: test
+Number of generations: 1
+Number of stored links and documents: 267260
+Number of stored documents: 16491
+Crawl order was: Page Importance
+Seed sites:
+   http://www.ucanbuyart.com/
+   http://www.ucanbuyart.com/fine_art_galleries.html
+   http://www.ucanbuyart.com/indexucba.html
+Sites allowed to crawl:
+   domain:ucanbuyart.com
+   domain:ucanbuyart.net
+Sites not allowed to be crawled:
+   domain:arxiv.org
+   domain:ask.com
+Meta Words:
+   http://www.ucanbuyart.com/(.+)/(.+)/(.+)/(.+)/
+
+|chris-polletts-macbook-pro:bin:159&gt;
+|chris-polletts-macbook-pro:bin:202&gt;php arc_tool.php list /Applications/XAMPP/xamppfiles/htdocs/crawls/cache/Archive1293767731 0 3
+
+BEGIN ITEM, LENGTH:21098
+[URL]
+http://www.ucanbuyart.com/robots.txt
+[HTTP RESPONSE CODE]
+404
+[MIMETYPE]
+text/html
+[CHARACTER ENCODING]
+ASCII
+[PAGE DATA]
+&lt;!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"&gt;
+
+&lt;html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"&gt;
+
+&lt;head&gt;
+    &lt;base href="http://www.ucanbuyart.com/" /&gt;
+   &lt;/pre&gt;
+</pre>
+    <h3>Querying an Index from the command-line</h3>
+<p>    The command-line script bin/query_tool.php can be use to query
+indices in the Yioop! WORK_DIRECTORY/cache. This tool can be used
+on an index regardless of whether or not Apache is running. It can be
+used for long running queries to put query results into memcache or filecache
+that might timeout when run within a browser. The command-line arguments
+for the query tool are:
+</p>
+<pre>
+php query_tool.php query num_results start_num lang_tag
+</pre>
+<p>The default num_results is 10, start_num is 0, and lang_tag is en-US.
+The following shows how one could do a query on "Chris Pollett":
+</p>
+<pre>
+|chris-polletts-macbook-pro:bin:141&gt;php query_tool.php "Chris Pollett"
+
+============
+TITLE: ECCC - Pointers to
+URL: http://eccc.hpi-web.de/static/pointers/personal_www_home_pages_of_complexity_theorists/
+IPs: 141.89.225.3
+DESCRIPTION: Homepage of the Electronic Colloquium on Computational Complexity located
+at the Hasso Plattner Institute of Potsdam, Germany Personal WWW pages of
+complexity people 2011 2010 2009 2011...1994 POINTE
+Rank: 3.9551158411
+Relevance: 0.492443777769
+Proximity: 1
+Score: 4.14
+============
+
+============
+TITLE: ECCC - Pointers to
+URL: http://www.eccc.uni-trier.de/static/pointers/personal_www_home_pages_of_complexity_theorists/
+IPs: 141.89.225.3
+DESCRIPTION: Homepage of the Electronic Colloquium on Computational Complexity located
+at the Hasso Plattner Institute of Potsdam, Germany Personal WWW pages of
+complexity people 2011 2010 2009 2011...1994 POINTE
+Rank: 3.886318974
+Relevance: 0.397622570289
+Proximity: 1
+Score: 4.03
+============
+
+.....
+</pre>
+<p>The index the results are returned from is the default index; however,
+all of the Yioop! meta words should work so you can do queries like
+"my_query i:timestamp_of_index_want". Query results depend on the
+kind of language stemmer being used, so French results might be better
+if one specifies fr-FR then if one relies on the default en-US.</p>
     <h2 id="references">References</h2>
     <dl>
 <dt id="APC2003">[APC2003]</dt>
diff --git a/en-US/pages/downloads.thtml b/en-US/pages/downloads.thtml
index 2132ddd..edb77aa 100755
--- a/en-US/pages/downloads.thtml
+++ b/en-US/pages/downloads.thtml
@@ -2,12 +2,12 @@
 <h2>Yioop! Releases</h2>
 <p>The Yioop! source code is still at an alpha stage. </p>
 <ul>
+<li><a href="http://www.seekquarry.com/viewgit/?a=archive&p=yioop&h=086ed7d360f3c6739bb41760805808682304d207&hb=eb5d93dae340a03e50c78ebd16eebecf01266315&t=zip"
+    >Version 0.72-ZIP</a></li>
+</li>
 <li><a href="http://www.seekquarry.com/viewgit/?a=archive&p=yioop&h=285678274124065f7071992f3c506d354f759379&hb=01e0c5cfa14265c88174ddd635542f72ddc3dac7&t=zip"
     >Version 0.701-ZIP</a></li>
 </li>
-<li><a href="http://www.seekquarry.com/viewgit/?a=archive&p=yioop&h=2c08046b95bb12ad08cc97323e5932a83130fe2d&hb=ac7fb82687b8724230040162e97774f18333d7a7&t=zip"
-    >Version 0.68-ZIP</a></li>
-</li>
 </ul>
 <h2>Git Repository</h2>
 <p>The Yioop! git repository allows anonymous read-only access. If you would to
ViewGit