Documentation updated to version 0.68

Chris Pollett [2011-05-19 20:May:th]

Documentation updated to version 0.68

Filename
en-US/pages/documentation.thtml

diff --git a/en-US/pages/documentation.thtml b/en-US/pages/documentation.thtml
index 0f0131a..b6a6bc0 100755
--- a/en-US/pages/documentation.thtml
+++ b/en-US/pages/documentation.thtml
@@ -6,7 +6,7 @@
         <li><a href="#required">Requirements</a></li>
         <li><a href="#installation">Installation and Configuration</a></li>
         <li><a href="#files">Summary of Files and Folders</a></li>
-        <li><a href="#interface">The Yioop! User Interface</a></li>
+        <li><a href="#interface">The Yioop! Search and User  Interface</a></li>
         <li><a href="#passwords">Managing Accounts</a></li>
         <li><a href="#userroles">Managing Users and Roles</a></li>
         <li><a href="#crawls">Managing Crawls</a></li>
@@ -29,7 +29,7 @@
     might be the right choice for your search engine needs. In the remainder
     of this document after the introduction, we will discuss how to get
     and install Yioop!, the files and folders used in the Yioop!,
-    user, role, and crawl management in the Yioop! system,  localization in
+    user, role, and crawl management in the Yioop! system, localization in
     the Yioop! system, and finally hacking Yioop!
     </p>
     <p>Since the mid-1990s a wide variety of search engine technologies
@@ -172,7 +172,13 @@
     word-document indexes (this is much the same idea as Pig). One of these
     operators allows one to make results from unions of stored crawls. This
     allows one to do many smaller topic specific crawls and combine them with
-    your own weighting scheme into a larger crawl. This approach is not
+    your own weighting scheme into a larger crawl. A second useful operator
+    allows you to display a certain number of results from a given subquery,
+    then go on to display results from other subqueries. This allows you to
+    make a crawl presentation like: the first result
+    should come from the open crawl results, the second result from
+    Wikipedia results, the next result should be an image, and any remaining
+    results should come from the open search results.  This approach is not
     unlike topic-sensitive page ranking approaches [<a href="#H2002">H2002</a>].
     Yioop! comes with a GUI facility to make the creation of these crawl mixes
     easy. Another useful operator Yioop! supports allows one to perform
@@ -274,8 +280,9 @@
     deploy.</li>
     <li>It determines search results using a number of iterators which
     can be combined like a simplified relational algebra.</li>
-    <li>Yioop! supports a union operator and a GUI interface which makes
-    it easy to combine results from several crawl indexes.</li>
+    <li>Yioop! supports a GUI interface which makes
+    it easy to combine results from several crawl indexes to create unique
+    result presentations.</li>
     <li>Indexing occurs as crawling happens, so when a crawl is stopped,
     it is ready to be used to handle search queries immediately.</li>
     <li>Yioop! has a GUI form that allows users to specify meta words
@@ -677,7 +684,7 @@ these subfolders there are files containing the respective kinds of data.</dd>
     <p><a href="#toc">Return to table of contents</a>.</p>


-    <h2 id='interface'>The Yioop! User Interface</h2>
+    <h2 id='interface'>The Yioop! Search and User Interface</h2>
 <p>
 The main search form for Yioop! looks like:
 </p>
@@ -718,6 +725,22 @@ words seperated by whitespace. This will cause Yioop! to compute a
 the terms listed. Yioop! also supports a variety of other search box
 commands and query types:</p>
 <ul>
+<li><b>#<em>num</em>#</b> in a query are treated as query presentation markers.
+When a query is first parsed, it is split into columns based with #<em>num</em>#
+as the column boundary. For example, bob #2# bob sally #3# sally #1#.
+A given column is used to present <em>num</em> results, where <em>num</em> is
+what is between the hash marks immediately after it. So in the query above,
+the subquery <em>bob</em> is used for the first two search results, then the
+subquery <em>bob sally</em> is used for the next three results, finally the last
+column is always used for any remaining results. In this case,
+the subquery <em>sally</em> would be used for all remaining results even though
+its <em>num</em> is 1. If a query does not have any #<em>num</em>#'s it is
+assumed that it has only one column.
+</li>
+<li>Separating query terms with a vertical bar | results in a disjunctive
+query. These are parsed for after the presentation markers above.
+So a search on: <em>Chris | Pollett</em> would return pages that have
+either the word Chris or the word Pollett or both.</li>
 <li>Putting the query in quotes, for example "Chris Pollett", will cause
 Yioop! to perform an exact match search. Yioop! in this case would only
 return documents that have the string "Chris Pollett" rather than just
@@ -726,9 +749,6 @@ Also, using the quote syntax, you can perform searches such as
 "Chris * Homepage" which would return documents which have the word Chris
 followed by some text followed by the word Homepage.
 </li>
-<li>Separating query terms with a vertical bar | results in a disjunctive
-query. So a search on: <em>Chris | Pollett</em> would return pages that have
-either the word Chris or the word Pollett or both.</li>
 <li>If the query has at least one word not prefixed by -, then adding
 a `-' in front of a word in a query means search for results not containing
 that term. So a search on: <em>of -the</em> would return results containing
@@ -746,6 +766,11 @@ pages found at that url or on that host.
 with the given extension. So a search: <em>Chris Pollett filetype:pdf</em>
 would return all documents containing the words Chris and Pollett and with
 extension pdf.</li>
+<li><b>media:kind</b> returns summaries of all documents found
+of the given media kind. Currently, the text and images are the two
+supported media kinds. So one can add to the
+search terms <em>media:images</em> to get only image results matching
+the query keywords.</li>
 <li><b>server:web_server_name</b> returns summaries of all documents
 served on that kind of web server. For example, <i>server:apache</i>.</li>
 <li><b>version:version_number</b> returns summaries of all documents
@@ -1101,12 +1126,36 @@ OdpRdfArchiveBundle
     <img src='resources/EditMix.png' alt='The Edit Mixes form'/>
     <p>Using the "Back" link on this page will take you to the prior screen.
     The first text field on the edit page lets you rename your mix if you so
-    desire. Beneath this is a table listing the current components of this
-    crawl mix. You can use this table to edit the weightings of crawl
-    components. You can also use it to delete existing components of the mix.
-    To add new components to a crawl mix use the drop-down beneath the
-    table. For changes on this page to take effect, the "Save" button beneath
-    this drop-down must be clicked.
+    desire. Beneath this is an "Add Groups" button. A group is a weighted
+    list of crawls. If only one group were present, then search results would
+    come from any crawl listed for this group. A given result's score
+    would be the weighted sum of the scores of the crawls in the group it
+    appears in. Search results  are displayed in descending order according to
+    this total score. If more that one group is present then the number of
+    results field for that group determines how many of the displayed results
+    should come from that group.
+    For the Crawl Mix displayed above, there are three groups: The first group
+    is used to display the first result, the second group is used to display
+    the second result, the last group is used to display any remaining search
+    results.</p>
+    <p>The UI for groups works as follows: The top row has three columns.
+    To add new components to a group use the drop-down in the first column.
+    The second column controls for how many results
+    the particular crawl group should be used. Different groups results are
+    presented in the order they appear in the crawl mix. The last group is
+    always used to display any remaining results for a search. The delete group
+    link in the third column can be used to delete a group. Beneath the first
+    row of a group, there is one row for each crawl that belongs to the group.
+    The first link for a crawl says how its scores should be weighted in
+    the search results for that group. The second column is the name of the
+    crawl. The third column is a space separated list of words to add to the
+    query when obtaining results for that crawl. So for example, in the
+    first group above, the only crawl is test, it has a weight of 1. The
+    keywords we inject for this crawl are media:text. This means we will
+    get whatever results from this crawl that consisted of text rather than
+    image pages. The last link in a crawl row allows you to delete a crawl
+    from a crawl group. For changes on this page to take effect, the "Save"
+    button beneath this drop-down must be clicked.
     </p>
     <p><a href="#toc">Return to table of contents</a>.</p>

ViewGit