Updated docs for version 0.74

Chris Pollett [2011-09-09 17:Sep:th]

Updated docs for version 0.74

Filename
en-US/pages/documentation.thtml
en-US/pages/downloads.thtml

diff --git a/en-US/pages/documentation.thtml b/en-US/pages/documentation.thtml
index 6232c9b..0cd8a23 100755
--- a/en-US/pages/documentation.thtml
+++ b/en-US/pages/documentation.thtml
@@ -1,5 +1,5 @@
 <div class="docs">
-<h1>Yioop! Documentation v 0.721</h1>
+<h1>Yioop! Documentation v 0.74</h1>
     <h2 id='toc'>Table of Contents</h2>
     <ul>
         <li><a href="#intro">Introduction</a></li>
@@ -17,7 +17,7 @@
         <li><a href="#commandline">Yioop! Command-line Tools</a></li>
         <li><a href="#references">References</a></li>
     </ul>
-
+
     <h2 id="intro">Introduction</h2>
     <p>The Yioop! search engine is designed to allow users
     to produce indexes of a web-site or a collection of
@@ -300,7 +300,8 @@
     is used.</li>
     <li>Yioop! supports a GUI interface which makes
     it easy to combine results from several crawl indexes to create unique
-    result presentations.</li>
+    result presentations. These combinations can be done in a conditional
+    manner using "if:" meta words.</li>
     <li>Indexing occurs as crawling happens, so when a crawl is stopped,
     it is ready to be used to handle search queries immediately.</li>
     <li>Yioop! supports an indexing plugin architecture to make it
@@ -372,8 +373,8 @@ extension=php_curl.dll
     <h3>Memory Requirements</h3>
     <p>In addition, to the prerequisite software listed above, Yioop! also
     has certain memory requirements. By default bin/queue_server.php
-    requires 1000MB, bin/fetcher.php requires 750MB, and index.php requires
-    200MB. These  values are set near the tops of each of these files in turn
+    requires 1000MB, bin/fetcher.php requires 850MB, and index.php requires
+    500MB. These  values are set near the tops of each of these files in turn
     with a line like:</p>
 <pre>
 ini_set("memory_limit","1000M");
@@ -819,45 +820,72 @@ the word "of" but not containing the word "the".</li>
 <b>link:url</b>, <b>ip:ip_address</b> are equivalent to having clicked on the
 Similar, Cached, InLinks, IP address links, respectively, on a summary with
 that url and ip address.</li>
-<li><b>site:url</b> or <b>site:host</b> returns all of the summaries of
-pages found at that url or on that host.
-</li>
-<li><b>info:url</b> returns the summary in the Yioop! index for the given url.
-</li>
+</ul>
+<p>The remaining query types we list in alphabetical order:</p>
+<ul>
+<li><b>date:Y</b>, <b>date:Y-M</b>, <b>date:Y-M-D</b>
+returns summaries of all documents crawled on the given date.
+For example, <i>date:2011-01</i> returns all document crawled in
+January, 2011.</li>
 <li><b>filetype:extension</b> returns summaries of all documents found
 with the given extension. So a search: <em>Chris Pollett filetype:pdf</em>
 would return all documents containing the words Chris and Pollett and with
 extension pdf.</li>
+<li><b>index:timestamp</b> or <b>i:timestamp</b> causes the search to
+make use of the IndexArchive with the given timestamp. So a search like:
+<em>Chris Pollett i:1283121141 | Chris Pollett</em>
+take results from the index with timestamp 1283121141 for
+Chris Pollett and unions them with results for Chris Pollett in the default
+index</li>
+<li><b>if:keyword!add_keywords_on_true!add_keywords_on_false</b> checks the
+current conjunctive query clause for "keyword"; if present, it adds
+"add_keywords_on_true" to the clause, else it adds the keywords
+"add_keywords_on_true".  This meta word is typically used as part of a
+crawl mix. The else condition does not need to be present. As an example,
+<em>if:oracle!info:http://oracle.com/!site:none</em> might be added to
+a crawl mix so that if a query had the keyword oracle then the site
+http://oracle.com/ would be returned by the given query clause. As part
+of a larger crawl mix this could be used to make oracle's homepage appear
+at the top of the query results. If you would like to inject multiple
+keywords then separate the keywords using plus rather than white space.
+For example, <i>if:corvette!fast+car</i>.</li>
+<li><b>info:url</b> returns the summary in the Yioop! index for the given url
+only.
+</li>
+<li><b>lang:IETF_language_tag</b>  returns summaries of all documents
+whose language can be determined to match the given language tag.
+For example, <i>lang:en-US</i>.</li>
 <li><b>media:kind</b> returns summaries of all documents found
 of the given media kind. Currently, the text and images are the two
 supported media kinds. So one can add to the
 search terms <em>media:images</em> to get only image results matching
 the query keywords.</li>
+<li><b>mix:name</b> or <b>m:name</b> tells Yioop! to use the crawl mix "name"
+when computing the results of the query. The section on mixing crawl indexes has
+more details about crawl mixes. If the name of the original mix had spaces,
+for example, <i>cool mix</i> then to use the mix you would need to replace
+the spaces with plusses, <i>m:cool+mix</i>.</li>
+<li><b>modified:Y</b>, <b>modified:Y-M</b>, <b>modified:Y-M-D</b>
+returns summaries of all documents which were last modified on the given date.
+For example, <i>modified:2010-02</i> returns all document which were last
+modifed in February, 2010.</li>
+<li><b>os:operating_system</b>  returns summaries of all documents
+served on servers using the given operating system. For example,
+<i>os:centos</i>, make sure to use lower case.</li>
 <li><b>server:web_server_name</b> returns summaries of all documents
 served on that kind of web server. For example, <i>server:apache</i>.</li>
+<li><b>site:url</b>, <b>site:host</b>, or <b>site:domain</b> returns all of
+the summaries of pages found at that url, host, or domain. As an example,
+<em>site:http://prints.ucanbuyart.com/lithograph_art.html</em>,
+<em>site:http://prints.ucanbuyart.com/</em>,
+<em>site:prints.ucanbuyart.com</em>, <em>site:.ucanbuyart.com</em>,
+<em>site:ucanbuyart.com</em>, <em>site:com</em>, will all returns with
+decreasing specificity. To return all pages listed in a Yioop! index you can
+do <i>site:all</i>.
+</li>
 <li><b>version:version_number</b> returns summaries of all documents
 served on web servers with the given version number.
 For example, one might have a query <i>server:apache version:2.2.9</i>.</li>
-<li><b>os:operating_system</b>  returns summaries of all documents
-served on servers using the given operating system. For example,
-<i>os:centos</i>, make sure to use lower case.</li>
-<li><b>lang:IETF_language_tag</b>  returns summaries of all documents
-whose language can be determined to match the given language tag.
-For example, <i>lang:en-US</i>.</li>
-<li><b>date:Y</b>, <b>date:Y-M</b>, <b>date:Y-M-D</b>
-returns summaries of all documents crawled on the given date.
-For example, <i>date:2011-01</i> returns all document crawled in
-January, 2011.</li>
-<li><b>modified:Y</b>, <b>modified:Y-M</b>, <b>modified:Y-M-D</b>
-returns summaries of all documents which were last modified on the given date.
-For example, <i>modified:2010-02</i> returns all document which were last
-modifed in February, 2010.</li>
-<li><b>index:timestamp</b> or <b>i:timestamp</b> causes the search to
-make use of the IndexArchive with the given timestamp. So a search like:
-<em>Chris Pollett i:1283121141 | Chris Pollett</em>
-take results from the index with timestamp 1283121141 for
-Chris Pollett and unions them with results for Chris Pollett in the default
-index</li>
 <li><b>weight:some_number</b> or <b>w:some_number</b> has the effect of
 multiplying all score for this portion of a query by some_number. For example,
 <em>Chris Pollett | Chris Pollett site:wikipedia.org w:5</em>
@@ -1248,7 +1276,9 @@ OdpRdfArchiveBundle
     first column has the name of the mix, the second column says how the
     mix is built out of component crawls, and the actions columns allows you
     to edit the mix, set it as the default index for Yioop! search results, or
-    delete the mix. When you create a new mix it also shows up on the Settings
+    delete the mix. You can also append "m:name+of+mix" or "mix:name+of+mix"
+    to a query to use that quiz without having to set it as the index.
+    When you create a new mix it also shows up on the Settings
     page. Creating a new mix or editing an existing mix sends you to a second
     page:</p>
     <img src='resources/EditMix.png' alt='The Edit Mixes form'/>
@@ -1281,9 +1311,12 @@ OdpRdfArchiveBundle
     first group above, the only crawl is test, it has a weight of 1. The
     keywords we inject for this crawl are media:text. This means we will
     get whatever results from this crawl that consisted of text rather than
-    image pages. The last link in a crawl row allows you to delete a crawl
-    from a crawl group. For changes on this page to take effect, the "Save"
-    button beneath this drop-down must be clicked.
+    image pages. Keywords can be used to make a particulat component of
+    a crawl mix behave in a conditional many by using the "if:" meta word
+    described in the search and user interface section. The last link in a
+    crawl row allows you to delete a crawl from a crawl group. For changes on
+    this page to take effect, the "Save" button beneath this drop-down must
+    be clicked.
     </p>
     <p><a href="#toc">Return to table of contents</a>.</p>
     <h2 id='filter'>Search Filter</h2>
@@ -1602,8 +1635,8 @@ ASCII
 <p>    The command-line script bin/query_tool.php can be use to query
 indices in the Yioop! WORK_DIRECTORY/cache. This tool can be used
 on an index regardless of whether or not Apache is running. It can be
-used for long running queries to put query results into memcache or filecache
-that might timeout when run within a browser. The command-line arguments
+used for long running queries that might timeout when run within a browser
+to put their results into memcache or filecache. The command-line arguments
 for the query tool are:
 </p>
 <pre>
diff --git a/en-US/pages/downloads.thtml b/en-US/pages/downloads.thtml
index dfbdefe..1e8af84 100755
--- a/en-US/pages/downloads.thtml
+++ b/en-US/pages/downloads.thtml
@@ -2,12 +2,12 @@
 <h2>Yioop! Releases</h2>
 <p>The Yioop! source code is still at an alpha stage. </p>
 <ul>
+<li><a href="http://www.seekquarry.com/viewgit/?a=archive&p=yioop&h=a1f11ce82d47abb95a30b7801f3d1ed2d8259489&hb=03dcb447966793b512a30fb55426f73e79a605b3&t=zip"
+    >Version 0.74-ZIP</a></li>
+</li>
 <li><a href="http://www.seekquarry.com/viewgit/?a=archive&p=yioop&h=c4aa1557604578a2b7c9b801c71a831a20242ffb&hb=6fd42f91a0de1c542f89556accb7ff44713efe28&t=zip"
     >Version 0.721-ZIP</a></li>
 </li>
-<li><a href="http://www.seekquarry.com/viewgit/?a=archive&p=yioop&h=285678274124065f7071992f3c506d354f759379&hb=01e0c5cfa14265c88174ddd635542f72ddc3dac7&t=zip"
-    >Version 0.701-ZIP</a></li>
-</li>
 </ul>
 <h2>Git Repository</h2>
 <p>The Yioop! git repository allows anonymous read-only access. If you would to

ViewGit