diff --git a/en-US/pages/documentation.thtml b/en-US/pages/documentation.thtml index 6232c9b..0cd8a23 100755 --- a/en-US/pages/documentation.thtml +++ b/en-US/pages/documentation.thtml @@ -1,5 +1,5 @@ <div class="docs"> -<h1>Yioop! Documentation v 0.721</h1> +<h1>Yioop! Documentation v 0.74</h1> <h2 id='toc'>Table of Contents</h2> <ul> <li><a href="#intro">Introduction</a></li> @@ -17,7 +17,7 @@ <li><a href="#commandline">Yioop! Command-line Tools</a></li> <li><a href="#references">References</a></li> </ul> - + <h2 id="intro">Introduction</h2> <p>The Yioop! search engine is designed to allow users to produce indexes of a web-site or a collection of @@ -300,7 +300,8 @@ is used.</li> <li>Yioop! supports a GUI interface which makes it easy to combine results from several crawl indexes to create unique - result presentations.</li> + result presentations. These combinations can be done in a conditional + manner using "if:" meta words.</li> <li>Indexing occurs as crawling happens, so when a crawl is stopped, it is ready to be used to handle search queries immediately.</li> <li>Yioop! supports an indexing plugin architecture to make it @@ -372,8 +373,8 @@ extension=php_curl.dll <h3>Memory Requirements</h3> <p>In addition, to the prerequisite software listed above, Yioop! also has certain memory requirements. By default bin/queue_server.php - requires 1000MB, bin/fetcher.php requires 750MB, and index.php requires - 200MB. These values are set near the tops of each of these files in turn + requires 1000MB, bin/fetcher.php requires 850MB, and index.php requires + 500MB. These values are set near the tops of each of these files in turn with a line like:</p> <pre> ini_set("memory_limit","1000M"); @@ -819,45 +820,72 @@ the word "of" but not containing the word "the".</li> <b>link:url</b>, <b>ip:ip_address</b> are equivalent to having clicked on the Similar, Cached, InLinks, IP address links, respectively, on a summary with that url and ip address.</li> -<li><b>site:url</b> or <b>site:host</b> returns all of the summaries of -pages found at that url or on that host. -</li> -<li><b>info:url</b> returns the summary in the Yioop! index for the given url. -</li> +</ul> +<p>The remaining query types we list in alphabetical order:</p> +<ul> +<li><b>date:Y</b>, <b>date:Y-M</b>, <b>date:Y-M-D</b> +returns summaries of all documents crawled on the given date. +For example, <i>date:2011-01</i> returns all document crawled in +January, 2011.</li> <li><b>filetype:extension</b> returns summaries of all documents found with the given extension. So a search: <em>Chris Pollett filetype:pdf</em> would return all documents containing the words Chris and Pollett and with extension pdf.</li> +<li><b>index:timestamp</b> or <b>i:timestamp</b> causes the search to +make use of the IndexArchive with the given timestamp. So a search like: +<em>Chris Pollett i:1283121141 | Chris Pollett</em> +take results from the index with timestamp 1283121141 for +Chris Pollett and unions them with results for Chris Pollett in the default +index</li> +<li><b>if:keyword!add_keywords_on_true!add_keywords_on_false</b> checks the +current conjunctive query clause for "keyword"; if present, it adds +"add_keywords_on_true" to the clause, else it adds the keywords +"add_keywords_on_true". This meta word is typically used as part of a +crawl mix. The else condition does not need to be present. As an example, +<em>if:oracle!info:http://oracle.com/!site:none</em> might be added to +a crawl mix so that if a query had the keyword oracle then the site +http://oracle.com/ would be returned by the given query clause. As part +of a larger crawl mix this could be used to make oracle's homepage appear +at the top of the query results. If you would like to inject multiple +keywords then separate the keywords using plus rather than white space. +For example, <i>if:corvette!fast+car</i>.</li> +<li><b>info:url</b> returns the summary in the Yioop! index for the given url +only. +</li> +<li><b>lang:IETF_language_tag</b> returns summaries of all documents +whose language can be determined to match the given language tag. +For example, <i>lang:en-US</i>.</li> <li><b>media:kind</b> returns summaries of all documents found of the given media kind. Currently, the text and images are the two supported media kinds. So one can add to the search terms <em>media:images</em> to get only image results matching the query keywords.</li> +<li><b>mix:name</b> or <b>m:name</b> tells Yioop! to use the crawl mix "name" +when computing the results of the query. The section on mixing crawl indexes has +more details about crawl mixes. If the name of the original mix had spaces, +for example, <i>cool mix</i> then to use the mix you would need to replace +the spaces with plusses, <i>m:cool+mix</i>.</li> +<li><b>modified:Y</b>, <b>modified:Y-M</b>, <b>modified:Y-M-D</b> +returns summaries of all documents which were last modified on the given date. +For example, <i>modified:2010-02</i> returns all document which were last +modifed in February, 2010.</li> +<li><b>os:operating_system</b> returns summaries of all documents +served on servers using the given operating system. For example, +<i>os:centos</i>, make sure to use lower case.</li> <li><b>server:web_server_name</b> returns summaries of all documents served on that kind of web server. For example, <i>server:apache</i>.</li> +<li><b>site:url</b>, <b>site:host</b>, or <b>site:domain</b> returns all of +the summaries of pages found at that url, host, or domain. As an example, +<em>site:http://prints.ucanbuyart.com/lithograph_art.html</em>, +<em>site:http://prints.ucanbuyart.com/</em>, +<em>site:prints.ucanbuyart.com</em>, <em>site:.ucanbuyart.com</em>, +<em>site:ucanbuyart.com</em>, <em>site:com</em>, will all returns with +decreasing specificity. To return all pages listed in a Yioop! index you can +do <i>site:all</i>. +</li> <li><b>version:version_number</b> returns summaries of all documents served on web servers with the given version number. For example, one might have a query <i>server:apache version:2.2.9</i>.</li> -<li><b>os:operating_system</b> returns summaries of all documents -served on servers using the given operating system. For example, -<i>os:centos</i>, make sure to use lower case.</li> -<li><b>lang:IETF_language_tag</b> returns summaries of all documents -whose language can be determined to match the given language tag. -For example, <i>lang:en-US</i>.</li> -<li><b>date:Y</b>, <b>date:Y-M</b>, <b>date:Y-M-D</b> -returns summaries of all documents crawled on the given date. -For example, <i>date:2011-01</i> returns all document crawled in -January, 2011.</li> -<li><b>modified:Y</b>, <b>modified:Y-M</b>, <b>modified:Y-M-D</b> -returns summaries of all documents which were last modified on the given date. -For example, <i>modified:2010-02</i> returns all document which were last -modifed in February, 2010.</li> -<li><b>index:timestamp</b> or <b>i:timestamp</b> causes the search to -make use of the IndexArchive with the given timestamp. So a search like: -<em>Chris Pollett i:1283121141 | Chris Pollett</em> -take results from the index with timestamp 1283121141 for -Chris Pollett and unions them with results for Chris Pollett in the default -index</li> <li><b>weight:some_number</b> or <b>w:some_number</b> has the effect of multiplying all score for this portion of a query by some_number. For example, <em>Chris Pollett | Chris Pollett site:wikipedia.org w:5</em> @@ -1248,7 +1276,9 @@ OdpRdfArchiveBundle first column has the name of the mix, the second column says how the mix is built out of component crawls, and the actions columns allows you to edit the mix, set it as the default index for Yioop! search results, or - delete the mix. When you create a new mix it also shows up on the Settings + delete the mix. You can also append "m:name+of+mix" or "mix:name+of+mix" + to a query to use that quiz without having to set it as the index. + When you create a new mix it also shows up on the Settings page. Creating a new mix or editing an existing mix sends you to a second page:</p> <img src='resources/EditMix.png' alt='The Edit Mixes form'/> @@ -1281,9 +1311,12 @@ OdpRdfArchiveBundle first group above, the only crawl is test, it has a weight of 1. The keywords we inject for this crawl are media:text. This means we will get whatever results from this crawl that consisted of text rather than - image pages. The last link in a crawl row allows you to delete a crawl - from a crawl group. For changes on this page to take effect, the "Save" - button beneath this drop-down must be clicked. + image pages. Keywords can be used to make a particulat component of + a crawl mix behave in a conditional many by using the "if:" meta word + described in the search and user interface section. The last link in a + crawl row allows you to delete a crawl from a crawl group. For changes on + this page to take effect, the "Save" button beneath this drop-down must + be clicked. </p> <p><a href="#toc">Return to table of contents</a>.</p> <h2 id='filter'>Search Filter</h2> @@ -1602,8 +1635,8 @@ ASCII <p> The command-line script bin/query_tool.php can be use to query indices in the Yioop! WORK_DIRECTORY/cache. This tool can be used on an index regardless of whether or not Apache is running. It can be -used for long running queries to put query results into memcache or filecache -that might timeout when run within a browser. The command-line arguments +used for long running queries that might timeout when run within a browser +to put their results into memcache or filecache. The command-line arguments for the query tool are: </p> <pre> diff --git a/en-US/pages/downloads.thtml b/en-US/pages/downloads.thtml index dfbdefe..1e8af84 100755 --- a/en-US/pages/downloads.thtml +++ b/en-US/pages/downloads.thtml @@ -2,12 +2,12 @@ <h2>Yioop! Releases</h2> <p>The Yioop! source code is still at an alpha stage. </p> <ul> +<li><a href="http://www.seekquarry.com/viewgit/?a=archive&p=yioop&h=a1f11ce82d47abb95a30b7801f3d1ed2d8259489&hb=03dcb447966793b512a30fb55426f73e79a605b3&t=zip" + >Version 0.74-ZIP</a></li> +</li> <li><a href="http://www.seekquarry.com/viewgit/?a=archive&p=yioop&h=c4aa1557604578a2b7c9b801c71a831a20242ffb&hb=6fd42f91a0de1c542f89556accb7ff44713efe28&t=zip" >Version 0.721-ZIP</a></li> </li> -<li><a href="http://www.seekquarry.com/viewgit/?a=archive&p=yioop&h=285678274124065f7071992f3c506d354f759379&hb=01e0c5cfa14265c88174ddd635542f72ddc3dac7&t=zip" - >Version 0.701-ZIP</a></li> -</li> </ul> <h2>Git Repository</h2> <p>The Yioop! git repository allows anonymous read-only access. If you would to