Revising docs for Version 0.9, a=chris

Chris Pollett [2012-09-17 08:Sep:th]

Revising docs for Version 0.9, a=chris

Filename
en-US/pages/about.thtml
en-US/pages/documentation.thtml
en-US/pages/downloads.thtml
en-US/pages/install.thtml
en-US/pages/resources.thtml

diff --git a/en-US/pages/about.thtml b/en-US/pages/about.thtml
index b3c5732..e76d95e 100755
--- a/en-US/pages/about.thtml
+++ b/en-US/pages/about.thtml
@@ -17,7 +17,7 @@ source search engine software distributed on the seekquarry.com
 site.</p>
 <p>The name Yioop! has the following history:
 I was looking for names that hadn't already been registered. My
-wife is Vietnamese so I thought I might have better luck with
+wife is Vietnamese, so I thought I might have better luck with
 Vietnamese words since all the English ones seemed to have been taken.
 I started with the word giup, which is the way to spell 'help'
 in Vietnamese if you remove the accents. It was already taken.
@@ -27,11 +27,12 @@ combined the two to get Yioop!</p>

 <h1>Dictionary Data</h1>
 <p>
-Bloom filters for n grams on the Yioop! test site were generated using
+<a href="http://en.wikipedia.org/wiki/Bloom_Filter">Bloom filters</a> for
+n grams on the Yioop! test site were generated using
 <a href="http://dumps.wikimedia.org/other/pagecounts-raw/">Wikimedia
 Page View Statistics</a>.
-Tries for word suggestion for all languages other than
-Vietnamese were built
+<a href="http://en.wikipedia.org/wiki/Trie">Trie</a>'s for word suggestion
+for all languages other than Vietnamese were built
 using the <a href="http://en.wiktionary.org/wiki/Wiktionary:Frequency_lists"
 >Wiktionary Frequency List</a>. These are available under a
 <a href="http://creativecommons.org/licenses/by-sa/3.0/">Creative
@@ -54,13 +55,14 @@ with localization: Mary Pollett,
 Jonathan Ben-David, Andrea Brunetti,
 Thanh Bui, Sujata Dongre, Animesh Dutta,
  Youn Kim, Akshat Kukreti, Vijeth Patil, Chao-Hsin Shih,
-and Sugi Widjaja. Thanks to
-Ravi Dhillon, Tanmayee Potluri, Shawn Tice, and Sandhya Vissapragada for
+and Sugi Widjaja. Thanks to Ravi Dhillon, Akshat Kukreti, Tanmayee Potluri,
+Shawn Tice, and Sandhya Vissapragada for
 creating patches for Yioop! issues. Several of my master's students have done
-projects related to Yioop!: Amith Chandranna, Priya Gangaraju, Vijaya Pamidi,
-Vijeth Patil, and Vijaya Sinha. Amith's code related to an Online version of
-the HITs algorithm is not currently in the main branch of Yioop!, but it is
-obtainable from <a href="http://www.cs.sjsu.edu/faculty/pollett/masters/<?php
+projects related to Yioop!: Amith Chandranna, Priya Gangaraju, Jalid Jalid,
+Vijaya Pamidi, Vijeth Patil, and Vijaya Sinha. Amith's code related to an
+Online version of the HITs algorithm is not currently in the main branch of
+Yioop!, but it is obtainable from
+<a href="http://www.cs.sjsu.edu/faculty/pollett/masters/<?php
 ?>Semesters/Spring10/amith/index.shtml">Amith Chandranna's student page</a>.
 Vijaya Pamidi developed a Firefox web traffic extension for Yioop!
 Her code is also obtainable from <a href="http://www.cs.sjsu.edu/faculty/<?php
diff --git a/en-US/pages/documentation.thtml b/en-US/pages/documentation.thtml
index f9db41e..0d50927 100755
--- a/en-US/pages/documentation.thtml
+++ b/en-US/pages/documentation.thtml
@@ -1,7 +1,8 @@
 <div class="docs">
-<h1>Yioop! Documentation v 0.88</h1>
+<h1>Yioop! Documentation v 0.90</h1>
     <h2 id='toc'>Table of Contents</h2>
     <ul>
+        <li><a href="#quick">Preface: Quick Start Guides</a></li>
         <li><a href="#intro">Introduction</a></li>
         <li><a href="#requirements">Requirements</a></li>
         <li><a href="#installation">Installation and Configuration</a></li>
@@ -14,6 +15,7 @@
         <li><a href="#mixes">Mixing Crawl Indexes</a></li>
         <li><a href="#page-options">Options for Pages that are Indexed</a></li>
         <li><a href="#editor">Results Editor</a></li>
+        <li><a href="#sources">Search Sources</a></li>
         <li><a href="#machines">GUI for Managing Machines and Servers</a></li>
         <li><a href="#localizing">Localizing Yioop! to a New Language</a></li>
         <li><a href="#framework">Building a Site using Yioop! as Framework</a>
@@ -23,7 +25,13 @@
         <li><a href="#commandline">Yioop! Command-line Tools</a></li>
         <li><a href="#references">References</a></li>
     </ul>
-
+    <h2 id="quick">Preface: Quick Start Guides</h2>
+    <p>This document serves as a detailed description of the
+    Yioop search engine. If you want to get started using Yioop! now,
+    but perhaps in less detail, you might want to first read the
+    <a href="http://localhost/git/seek_quarry/?c=main&p=install">Installation
+    Guides</a> page.
+    </p>
     <h2 id="intro">Introduction</h2>
     <p>The Yioop! search engine is designed to allow users
     to produce indexes of a web-site or a collection of
@@ -38,11 +46,12 @@
     search engine technologies which exist today, how Yioop! fits into this
     eco-system, and when Yioop! might be the right choice for your search
     engine needs. In the remainder of this document after the introduction,
-    we discuss how to get and install Yioop!, the files and folders used
-    in Yioop!, user, role, crawl, and machine management in the Yioop! system,
-    localization in the Yioop! system, building a site using the Yioop!
-    framework, embedding Yioop! in an existing web-site,
-    customizing Yioop!, and the Yioop! command-line tools.
+    we discuss how to get and install Yioop!; the files and folders used
+    in Yioop!; user, role, search, subsearch, crawl,
+     and machine management in the Yioop! system;
+    localization in the Yioop! system; building a site using the Yioop!
+    framework; embedding Yioop! in an existing web-site;
+    customizing Yioop!; and the Yioop! command-line tools.
     </p>
     <p>Since the mid-1990s a wide variety of search engine technologies
     have been explored. Understanding some of this history is useful
@@ -325,6 +334,13 @@
     you can have searchable access to many data sets as well as have the
     ability to maintain your data going forward.
     </p>
+    <p>Another important aspect of creating a modern search engine is
+    the ability to display in an appropriate way various media sources.
+    Yioop comes with built-in susearch abilities for images, where
+    results are displayed as image strips; video, where thumbnails for
+    video are shown; and news, where news items are grouped together and
+    a configurable set of news/twitter feeds can be set to be updated on an
+    hourly basis.</p>
     <p>
     This concludes the discussion of how Yioop! fits into the current and
     historical landscape of search engines and indexes. Here is short summary
@@ -363,13 +379,17 @@
     HTML, BMP, DOC, ePub, GIF, JPG, PDF, PPT, PPTX, PNG, RSS, RTF, sitemaps,
     SVG, XLSX, and XML. It has a web interface for controlling which amongst
     these filetypes (or all of them) you want to index.</li>
+    <li>Yioop supports subsearches geared towards presenting certain
+    kinds of media such as images, video, and news. The list of video and
+    news sites can be configured through the GUI. News sites are updated
+    hourly.</li>
     <li>Crawling, indexing, and serving search results can be done on a
     single machine or distributed across several machines.</li>
     <li>It uses a simplified distributed model that is straightforward to
     deploy.</li>
     <li>The fetcher/queue_server processes on several machines can be
     managed through the web interface of a main Yioop! instance.</li>
-    <li>Yioop! installations can created with a variety of topologies:
+    <li>Yioop! installations can be screated with a variety of topologies:
     one queue_server and many fetchers or several queue_servers and
     many fetchers.</li>
     <li>It determines search results using a number of iterators which
@@ -537,7 +557,12 @@ seekquarry.com</a>.
 After downloading and unzipping it, move the Yioop! search engine into some
 folder under your web server's document root. Yioop! makes use of an auxiliary
 folder to store profile/crawl data. Before Yioop! will
-run you must configure this directory. To do this
+run you must configure this directory. This can be done in one
+of two ways: either through the web interface (the preferred way), as we
+will now describe or using the configs/configure_tool.php script
+(which is harder, but might be suitable for some VPS settings) which will be
+described in the <a href="#commandline">command line tools section</a>.
+From the web interface, to configure this directory
 point your web browser to where your Yioop! folder is located, a
 configuration page should appear and let you set the
 path to the auxiliary folder (Search Engine Work Directory). This
@@ -631,12 +656,17 @@ you which element and links you would like to have presented on the search
 landing and search results pages. The Word Suggest check box controls whether
 a drop down of word suggestions should be presented by Yioop! when a user
 starts typing in the Search box. The Subsearch checkbox controls whether the
-links for Image and Video search appear in the top bar of Yioop! The Signin
-checkbox controls whether to display the link to the page for users to sign in
-to Yioop!  The Cache checkbox toggles whether a link to the cache of a search
-item should be displayed as part of each search result. The Similar checkbox
-toggles whether a link to similar search items should be displayed as part
-of each search result. The Inlinks checkbox toggles
+links for Image, Video, and News search appear in the top bar of Yioop!
+You can actually configure what these links are in the
+<a href="#sources">Search Sources</a>
+activity. The checkbox here is a global setting for displaying them or
+not. In addition, if this is unchecks then the hourly activity of
+downloading any RSS media sources for the News subsearch will be turned
+off. The Signin  checkbox controls whether to display the link to the page
+for users to sign in  to Yioop!  The Cache checkbox toggles whether a link to
+the cache of a search item should be displayed as part of each search result.
+The Similar checkbox toggles whether a link to similar search items should be
+displayed as part of each search result. The Inlinks checkbox toggles
 whether a link for inlinks to a search item should be displayed as part
 of each search result. Finally, the IP address checkbox toggles
 whether a link for pages with the same ip address should be displayed as part
@@ -748,7 +778,11 @@ probably not need to edit any of these files directly as you can set the most
 common configuration settings from with the admin panel of Yioop! The file
 config.php controls a number of parameters about how data is stored, how,
 and how often, the queue_server and fetchers communicate, and which file types
-are supported by Yioop! createdb.php can be used to create a bare instance of
+are supported by Yioop! configure_tool.php is a command-line tool which
+can perform some of the configurations needed to get a Yioop! installation
+running. It is only necessary in some virtual private server settings --
+the prefered way to configure Yioop! is through the web interface.
+createdb.php can be used to create a bare instance of
 the Yioop! database with a root admin user having no password. This script is
 not strictly necessary as the database should be creatable via the admin panel;
 however, it can be useful if the database isn't working for some reason.
@@ -904,6 +938,8 @@ crawls themselves are NOT stored in the database.</dd>
 <dt>log</dt><dd>When the fetcher and queue_server are run as daemon processes
 log messages are written to log files in this folder. Log rotation is also done.
 These log files can be opened in a text editor or console app.</dd>
+<dt>query</dt><dd>This folder is used to stored caches of already performed
+queries when file caching is being used.</dd>
 <dt>schedules</dt><dd>This folder has three kinds of subfolders:
 IndexDataUNIX_TIMESTAMP, RobotDataUNIX_TIMESTAMP, and
 ScheduleDataUNIX_TIMESTAMP. When a fetcher communicates with the web app
@@ -929,7 +965,8 @@ will appear, but with a somewhat confused summary based only on link text;
 the results editor allows one to give a meaningful summary for Facebook.</dd>
 <dt>temp</dt><dd>This is used for storing temporary files that Yioop!
 creates during the crawl process. For example, temporary files used while
-making thumbnails.</dd>
+making thumbnails. Each fetcher has its own temp folder, so you might
+also see folders 0-temp, 1-temp, etc.</dd>
 </dl>
     <p><a href="#toc">Return to table of contents</a>.</p>

@@ -991,34 +1028,61 @@ on the page are used to locate the page via a search query. This
 can be viewed as an "SEO" view of the page.</p>
 <img src='resources/CacheSEO.png' alt='Example Cache SEO Results'
 width="70%"/>
-<p>In addition, to a straightforward web search, one can also do image and
-video search by clicking on the Images or Video link in the top bar
-of Yioop search pages. Below are some examples of what these look like
-for a search on "Obama":</p>
+<p>In addition, to a straightforward web search, one can also do image,
+video, news searches by clicking on the Images, Video, or News links in
+the top bar of Yioop search pages. Below are some examples of what these look
+like for a search on "Obama":</p>
 <img src='resources/ImageSearch.png' alt='Example Image Search Results'
 width="70%"/>
 <img src='resources/VideoSearch.png' alt='Example Video Search Results'
 width="70%"/>
+<img src='resources/NewsSearch.png' alt='Example News Search Results'
+width="70%"/>
 <p>When Yioop! crawls a page it adds one of the following meta
-words to the page media:text, media:image, or media:video. A usual
+words to the page media:text, media:image, or media:video. RSS feed
+sources that have been added to Media Sources under the <a href="#sources"
+>Search Sources</a>
+activity are downloaded from each hour. Each RSS item on such a downloaded
+pages has the meta word media:news added to it. A usual
 web search just takes the search terms provided to perform a search.
-An Images or Video search tacks on to the search terms, media:image or
-media:video. Detection of images is done via mimetype at initial page download
-time. At this time a thumbnail is generated. When search results are presented
-it is this cached thumbnail that is shown. So image search does not leak
-information to third party sites. On any search results page with images,
-Yioop! tries to group the images into a thumbnail strip. This is true of
-both normal and images search result pages. In the case of image search result
-pages, except for not-yet-downloaded pages, this results in almost all of
-the results being the thumbnail strip. Video page detection is not done
-through mimetype as popular sites like YouTube, Vimeo, and others vary in
-how they use Flash or video tags to embed video on a web page. Yioop!
-uses the format of the URL from particular web sites to guess if the page
-contains a video or not. To get a thumbnail for the video it uses the
-API of the particular site in question. <b>This could leak information to third
-party sites about your search.</b>
+An Images, Video, News search tacks on to the search terms, media:image or
+media:video, or media:news. Detection of images is done via mimetype at
+initial page download time. At this time a thumbnail is generated. When search
+results are presented it is this cached thumbnail that is shown. So image
+search does not leak information to third party sites. On any search results
+page with images, Yioop! tries to group the images into a thumbnail strip. This
+is true of both normal and images search result pages. In the case of image
+search result pages, except for not-yet-downloaded pages, this results in
+almost all of the results being the thumbnail strip. Video page detection is
+not done through mimetype as popular sites like YouTube, Vimeo, and others
+vary in how they use Flash or video tags to embed video on a web page. Yioop!
+uses the Video Media sources that have been added in the Search Sources
+activity to detect whether a link is in the format of a video page. To get
+a thumbnail for the video it again uses the method for rewriting the video
+url to an image link specified for the particular site in question in
+Search Sources. i.e., the thumbnail will be downloaded from the orginal site.
+<b>This could leak information to third party sites about your search.</b>.
 </p>
-<p>A basic query to the Yioop! search form is typically a sequence of
+<p>The format of News search results is somewhat different from usual
+search results. News search results can appear during a normal web search,
+in which case they will appear clustered together, with a leading
+link "News results for ...". No snippets will be shown for these links,
+but the original media source for the link will be displayed and the time
+at which the item first appeared will be displayed. On the News subsearch
+page, the underneath the link to the item, the complete RSS description
+of the new item is displayed. In both settings, it is possible to click
+on the media source name next to the news item link. This will take one
+to a page of search results listing all articles from that media source.
+For instance, if one were to click on the Yahoo News text above
+one would go to results for all Yahoo News articles. This is equivalent
+to doing a search on: media:news:Yahoo+News . If one clicks on the News
+subsearch, not having specified a query yet, then all stored
+news items in the current language will be displayed, roughly ranked by
+recentness. If one has RSS media sources of which are set to be from
+different locales, then this will be taken into account on this blank query
+News page.</p>
+<p>Turning now to the topic of how to enter a query in Yioop:
+A basic query to the Yioop! search form is typically a sequence of
 words seperated by whitespace. This will cause Yioop! to compute a
 "conjunctive query", it will look up only those documents which contain all of
 the terms listed. Yioop! also supports a variety of other search box
@@ -1105,8 +1169,8 @@ is useful for checking if a particular page is in the index.
 whose language can be determined to match the given language tag.
 For example, <i>lang:en-US</i>.</li>
 <li><b>media:kind</b> returns summaries of all documents found
-of the given media kind. Currently, the text, image, and video are the three
-supported media kinds. So one can add to the
+of the given media kind. Currently, the text, image, news, and video are
+the four supported media kinds. So one can add to the
 search terms <em>media:image</em> to get only image results matching
 the query keywords.</li>
 <li><b>mix:name</b> or <b>m:name</b> tells Yioop! to use the crawl mix "name"
@@ -1761,6 +1825,87 @@ OdpRdfArchiveBundle
     http://www.cs.sjsu.edu/faculty/pollett/ would not appear in search
     results.</p>
     <p><a href="#toc">Return to table of contents</a>.</p>
+    <h2 id='sources'>Search Sources</h2>
+    <p>The Search Sources activity is used to manage the media sources
+    available to Yioop!, and also to control the subsearch links displayed
+    on the top navigation bar. The Search Sources activity looks like:</p>
+<img src='resources/SearchSources.png' alt='The Search Sources form'/>
+    <p>The top form is used to add a media source to Yioop! Currently,
+    the Media Kind can be either Video or RSS. Video Media sources
+    are used to help Yioop! recognize links which are of videos on
+    a web video site such as YouTube. This helps in both tagging
+    such pages with the meta word media:video in a Yioop index, and
+    in being able to render a thumbnail of the video in the search results.
+    When the media kind is set to video, this form has three fields:
+    Name, which should be a short familiar name for the video site (for example,
+    YouTube); URL, which should consist of a url pattern by which to
+    recognize a video on that site; and Thumb, which consist of a url pattern
+    to replace the original pattern by to find the thumbnail for that video.
+    For example, the value of URL for YouTube is:<p>
+    <pre>
+    http://www.youtube.com/watch?v={}&
+    </pre>
+    <p>This will match any url which begins with
+    http://www.youtube.com/watch?v= followed by some string followed by
+    &amp; followed by another string. The {} indicates that from
+    v= to the &amp; should be treated as the identifier for the video. The
+    Thumb url in the case of YouTube is:
+    </p>
+    <pre>
+    http://img.youtube.com/vi/{}/2.jpg
+    </pre>
+    <p>If the identifier in the first video link was yv0zA9kN6L8, then
+    using the above, when displaying a thumb for the video, Yioop!
+    would use the image source:</p>
+    <pre>
+    http://img.youtube.com/vi/{yv0zA9kN6L8}/2.jpg
+    </pre>
+    <p>Some video sites have more  complicated APIs for specifying thumbnails.
+    In which case, you can still do media:video tagging but display
+    a blank thumbnail rather than suggest a thumbnail link. To do this
+    one uses the thumb url.</p>
+    <pre>
+    http://www.yioop.com/resources/blank.png?{}
+    </pre>
+    <p>If one selects the media kind to be RSS (really simple syndication,
+    a kind of news feed), then the media sources
+    form has three fields: Name, again a short familiar name for the
+    RSS feed; URL, the url of the RSS feed, and Language, what language
+    the RSS feed is. This last element is used to control whether or
+    not a news item will display given the current language settings of
+    Yioop! If under the Configure activity, the subsearch checkbox
+    is checked so that subsearches are displayed, then Yioop! will
+    try to download its list of RSS feeds hourly. This does not need
+    a queue_server or a fetcher running, and is accomplished by making
+    a curl request from the web app to the sites in question on the
+    first search performed on Yioop! after an hour has elapsed between
+    the last RSS download.</p>
+    <p>Beneath this top form is a table listing all the currently
+    added media sources, their urls, and a link that allows one to delete
+    the source.</p>
+    <p>The second form on the page is the Add a Subsearch form.
+    This form has three fields: Folder Name is a short familiar
+    name for the subsearch, it will appear as part of the query
+    string when the given subsearch is being performed. For example,
+    if the folder names was news, then s=news will appear as aprt of
+    the query string when a news subsearch is being done. Folder Name
+    is also used to make the localization identifier used in translating
+    the subsearch's name into different languages. This identifer will
+    have the format db_subsearch_identifer. For example,
+    db_subsearch_news. Index Source, the second form element, is used
+    to specify a crawl or a crawl mix that the given subsearch
+    should use in returning results. Results per Page, the last form element,
+    controls the number of search results which should appear when using
+    this kind of subsearch.</p>
+    <p>Beneath this form is a table listing all the currently added
+    subsearches and their properties. The actions column at the end of this
+    table let's one either localize or delete a given subsearch. Clicking
+    localize takes one to the Manage Locale's page for the default locale
+    and that parituclar subsearch localization identifier, so that you can
+    fill in a value for it. Remembering the name of this identifier,
+    one can then in Manage Locales navigate to other locales, and fill
+    in translations for them as well, if desired.</p>
+    <p><a href="#toc">Return to table of contents</a>.</p>
     <h2 id='machines'>GUI for Managing Machines and Servers</h2>
     <p>Rather than use the command line as described in the
     <a href="#prereqs">Prerequisites for Crawling</a> section, it is possible
@@ -2417,6 +2562,63 @@ xmlns:atom="http://www.w3.org/2005/Atom"
     <p>This completes the discussion of how to write an indexing plugin.</p>
     <p><a href="#toc">Return to table of contents</a>.</p>
     <h2 id='commandline'>Yioop! Command-line Tools</h2>
+    <h3>Configuring Yioop from the Command-line</h3>
+    <p>In a multiple queue server and fetcher setting, one might have web access
+    only to the name server machine -- all the other machines might be on
+    virtual private servers to which one has only command-line access. Hence,
+    it is useful to be able to set up a work directory and configure Yioop
+    through the command-line. To do this one can use the script
+    configs/configure_tool.php. One can run it from the command-line within
+    the configs folder, with a line like:
+    </p>
+    <pre>
+php configure_tool.php
+    </pre>
+    <p>When launched, this program will display a menu like:</p>
+    <pre>
+
+YIOOP! CONFIGURATION TOOL
++++++++++++++++++++++++++
+
+Checking Yioop configuration...
+===============================
+Check Passed.
+Using configs/local_config.php so changing work directory above may not work.
+===============================
+
+Available Options:
+==================
+(1) Create/Set Work Directory
+(2) Change root password
+(3) Set Default Locale
+(4) Debug Display Set-up
+(5) Search Access Set-up
+(6) Search Page Elements and Links
+(7) Name Server Set-up
+(8) Crawl Robot Set-up
+(9) Exit program
+
+Please choose an option:
+    </pre>
+    <p>
+    Except for the Change root password option, these correspond to the
+    different fieldsets on the Configure activity. The command-line forms let
+    one gets from selecting one of these choise let one set the same
+    values as were described earlier in the
+    <a href="#installation">Installation</a> section. The change root password
+    option lets one set the account password for root. i.e., the main admin
+    user.On a non-nameserver machine, it is probably simpler to go with
+    a sqlite database, rather than hit on a global mysql database from
+    each machine. Such a barebones local database set-up would typically
+    only have one user, root</p>
+    <p>Another thing to consider, when configuring a collection of Yioop!
+    machines in such a setting, is that by default, under Search Access Set-up,
+    subsearch is unchecked. This means the RSS feeds won't be downloaded
+    hourly on such machines. If one unchecks this, they will. This may or
+    may not make sense to do -- it might be advantageous to distribute the
+    downloading of RSS feeds across several machines -- any machine in
+    a Yioop cluster can send media news results in response to a search query.
+    </p>
     <h3>Examining the contents of WebArchiveBundle's and
     IndexArchiveBundles's</h3>
     <p>
@@ -2619,7 +2821,7 @@ In: Seventh International World-Wide Web Conference
 MIT Press. 2010.</dd>
 <dt id="DG2004">[DG2004]</dt>
 <dd>Jeffrey Dean and Sanjay Ghemawat.
-<a href="http://labs.google.com/papers/mapreduce-osdi04.pdf"
+<a href="http://research.google.com/archive/mapreduce-osdi04.pdf"
 >MapReduce: Simplified Data Processing on Large Clusters</a>.
 OSDI'04: Sixth Symposium on Operating System Design and Implementation. 2004<dd>
 <dt id="GGL2003">[GGL2003]</dt>
diff --git a/en-US/pages/downloads.thtml b/en-US/pages/downloads.thtml
index a1c4548..830af4a 100755
--- a/en-US/pages/downloads.thtml
+++ b/en-US/pages/downloads.thtml
@@ -2,11 +2,20 @@
 <h2>Yioop! Releases</h2>
 <p>The Yioop! source code is still at an alpha stage. </p>
 <ul>
+<li><a href=http://www.seekquarry.com/viewgit/?a=archive&p=yioop&h=3ba7c0901b792891b6b279732e5184668b294e44&hb=8b105749c471bbfe97df88e84df8f9c239027a01&t=zip"
+    >Version 0.90-ZIP</a></li>
 <li><a href="http://www.seekquarry.com/viewgit/?a=archive&p=yioop&h=1be2b50b8436998ce8d2d41f5db3b470610aa817&hb=6fc863b1aaf26d8a0abf49a2aad9c7ce440ea307&t=zip"
     >Version 0.88-ZIP</a></li>
-<li><a href="http://www.seekquarry.com/viewgit/?a=archive&p=yioop&h=2bb7f54c7f52d4eebf605430400088de1c0505cf&hb=876e9b0380d96d975d55cbcf11fbfd1ad03a6278&t=zip"
-    >Version 0.861-ZIP</a></li>
 </ul>
+<h2>Installation</h2>
+<p>The documentation page has information about the
+<a href="http://localhost/git/seek_quarry/<?php ?>
+?c=main&p=documentation#requirements">requirements</a> of and
+<a href="http://localhost/git/seek_quarry/<?php ?>
+?c=main&p=documentation#installation">installation procedure</a> for
+Yioop!. These sections have what you need for a typical Mac and Linux
+home user. The <a href="?c=main&amp;p=install">Install Guides</a> page
+explains how to get Yioop work in some other common settings.</p>
 <h2>Git Repository / Contributing</h2>
 <p>The Yioop! git repository allows anonymous read-only access. If you would to
 contribute to Yioop!, just do a clone of the most recent code,
diff --git a/en-US/pages/install.thtml b/en-US/pages/install.thtml
new file mode 100755
index 0000000..3298c32
--- /dev/null
+++ b/en-US/pages/install.thtml
@@ -0,0 +1,451 @@
+<h1>Installation Guides</h1>
+    <ul>
+        <li><a href="#xampp">XAMPP on Windows</a></li>
+        <li><a href="#wamp">WAMP</a></li>
+        <li><a href="#cpanel">CPanel</a></li>
+        <li><a href="#multiple">System with Multiple Queue Servers</a></li>
+    </ul>
+
+<h2 id="xampp">XAMPP on Windows</h2>
+<ol>
+<li>Download <a
+    href="http://technet.microsoft.com/en-us/sysinternals/bb896649">pstools</a>
+    (which contains psexec).</li>
+<li>Download <a
+    href="http://www.apachefriends.org/en/xampp-windows.html">Xampp</a>
+(Note: Yioop! 0.9 or higher works on latest version;
+Yioop! 0.88 or lower works up till Xampp 1.7.7)</li>
+<li>Install xampp</li>
+<li>Copy PsExec from the pstools zip folder to C:\xampp\php</li>
+<li>Open control panel. Go to System =&gt; Advanced system settings =&gt;
+Advanced. Click on Environment Variables. Look under System Variables and
+select Path. Click Edit. Tack onto the end of the Variable Values:
+<pre>
+C:\xampp\php;
+</pre>
+Click OK a bunch times to get rid of windows. Close the control panel window.
+Reopen it and go to the same place to make sure the path variable really
+was changed.
+</li>
+<li>Edit the file C:\xampp\php\php.ini in Notepad. Search on curl:
+change the line:
+<pre>
+;extension=php_curl.dll
+</pre>
+to
+<pre>
+extension=php_curl.dll
+</pre>
+Then go to start of file and search on post_max_size. Change the line
+<pre>
+post_max_size = 8M
+</pre>
+to
+<pre>
+post_max_size = 32M
+</pre>
+Start Apache.</li>
+<li>Download <a href="http://www.seekquarry.com/viewgit/?a=summary&p=yioop"
+>Yioop!</a> (you should choose some version &gt; 0.88 or latest)
+Unzip it into
+<pre>
+C:\xampp\htdocs
+</pre>
+Rename the downloaded folder yioop (so now have
+a folder C:\xampp\htdocs\yioop).
+</li>
+<li>
+Point yout browser at:
+<pre>
+http://localhost/yioop/
+</pre>
+enter under "Search Engine Work Directory", the path
+<pre>
+C:/xampp/htdocs/yioop_data
+</pre>
+It will ask you to log into Yioop. Login with username root and empty password.
+</li>
+<li>
+In Yioop's Configure screen continue filling out your settings:
+<pre>
+Default Language: English
+Debug Display: (all checked)
+Search access: (all checked)
+Database Set-up: (left unchanged)
+Search Auxiliary Links Displayed: (all checked)
+Name Server Set-up
+Server Key: 0
+Name Server Url: http://localhost/yioop/
+Crawl Robot Name: TestBot
+Robot Instance: A
+Robot Description: TestBot should be disallowed from everywhere because
+the installer of Yioop did not customize this to his system.
+Please block this ip.
+</pre>
+</li>
+<li>Go to Manage Machines and added a single machine under Add Machine:
+<pre>
+Machine Name: Local
+Machine Url: http://localhost/yioop/
+Is Mirror: (uncheck)
+Has Queue Server: (check)
+Number of Fetchers 1
+Submit
+</pre>
+</li>
+<li>You might need to restart the machine to get the next steps to work</li>
+<li>In Manage Machines, click ON on the queue server and on your fetcher.
+For each click
+on the log file and make sure that after at most two minutes you are seeing log
+entries appear.</li>
+<li>Now go to Manage Crawls. Click on Options.
+Set the options you would like for your crawl,
+click Save.</li>
+<li>Type name of the crawl and start crawl. Let it crawl for a while,
+till you see the Total URls Seen > 1.</li>
+<li>Click stop crawl and waited for the crawl to appear in the previous
+crawls list.Set it as the default crawl. Then you can search using this index.
+</li>
+</ol>
+<p>
+The above set-up is for a non-command line crawl, and it works as described.
+For command line crawls on versions of Yioop prior to Version 0.9 you might
+have the problem that log messages are written to Xampp's PHP error log
+because Yioop uses the PHP error_log function and on Xampp this is where
+it defaults to. This is not an issue in Version 0.9 or above.
+</p>
+
+
+<h2 id="wamp">Wamp</h2>
+<p>
+These instructions should work for Yioop! Version 0.84 and above.
+WampServer allows you to run a 64 bit version of PHP.
+</p>
+<ol>
+<li>Download <a
+    href="http://technet.microsoft.com/en-us/sysinternals/bb896649">pstools
+    (which contains psexec)</a>.</li>
+<li>Download <a
+    href="http://www.wampserver.com/en/">WampServer</a> (Note: Yioop! 0.9 or
+higher works works with PHP 5.4)</li>
+<li>Download <a href="http://www.seekquarry.com/viewgit/?a=summary&p=yioop"
+>Yioop!</a> (you should choose some version &gt; 0.88 or latest)
+Unzip it into
+<pre>
+C:\wamp\www
+</pre>
+Rename the downloaded folder yioop (so now have
+a folderC:\wamp\www\yioop).</li>
+<li>Edit php.ini to enable multicurl and change the post_max_size. To do
+this use the Wamp dock tool and navigate to wamp =&gt; php =&gt; extension.
+Turn on curl. Next navigate to wamp =&gt; php =&gt; php.ini .
+Do a find on post_max_size and set its value to 32MB.</li>
+</li>
+<li>Wamp has two php.ini files. The one we just edited by doing this is in
+<pre>
+C:\wamp\bin\apache\Apache2.2.21\bin
+</pre>
+You need to also edit the php.ini in
+<pre>
+C:\wamp\bin\php\php5.3.10
+</pre>
+Depending on your version of Wamp the PHP version number may be different.
+Open this php.ini in Notepad search on curl then uncomment the line. Similarly,
+edit post_max_size and set it to 32MB.
+</li>
+<li>Copy PsExec.exe to C:\wamp\bin\php\php5.3.10 .</li>
+<li>Go  to control panel =&gt; system =&gt; advanced system settings =>
+advanced =&gt; environment variables =&gt; system variables =&gt;path.
+Click edit and add to the path variable:
+<pre>
+;C:\wamp\bin\php\php5.3.10;
+</pre>
+Exit control panel, then re-enter to double check that path really was added
+ to end</li>
+<li> Next go to
+wamp =&gt; apache =&gt; restart service. In a browser go to Yioop =&gt;
+Configure and input the following settings:
+<pre>
+Search Engine Work Directory: C:/yioop_data
+Default Language: English
+Debug Display: (all checked)
+Search access: (all checked)
+Database Set-up: (left unchanged)
+Search Auxiliary Links Displayed: (all checked)
+Name Server Set-up
+Server Key: 0
+Name Server Url: http://localhost/yioop/
+Caral Robot Name: TestBot
+Robot Instance: A
+Robot Description: TestBot should be disallowed from everywhere because
+the installer of Yioop did not customize this to his system.
+Please block this ip.
+</pre>
+</li>
+<li>Go to Manage Machines. Add a single machine under Add Machine using the
+settings:
+<pre>
+Machine Name: Local
+Machine Url: http://localhost/yioop/
+Is Mirror: (uncheck)
+Has Queue Server: (check)
+Number of Fetchers 1
+Submit
+</pre>
+</li>
+<li>Under Machine Information turn the Queue Server and Fetcher On.</li>
+<li>Go to Manage Crawls. Click on the options to set up where you want to crawl.
+Type in a name for the crawl and click start crawl.</li>
+<li>Let it crawl for a while, till you see the Total URls Seen &gt; 1.</li>
+<li>Then click Stop Crawl and wait for the crawl to appear in the previous
+crawls list. Set it as the default crawl. You should be
+able to search using this index
+</li>
+</ol>
+
+<h2 id="cpanel">CPanel</h2>
+<p>
+Generally, it is not practical to do your crawling in a cPanel hosted website.
+However, cPanel works perfectly fine for hosting the results of a crawl you did
+elsewhere. Here it is briefly described how to do this. In capacity planning,
+your installation, as a rule of thumb, you should
+expect your index to be of comparable size (number of bytes) to the sum of
+the sizes of the pages you downloaded.
+</p>
+<ol>
+<li>Download <a href="http://www.seekquarry.com/viewgit/?a=summary&p=yioop"
+>Yioop!</a> (you should choose some version &gt; 0.88 or latest)
+to your local machine.</li>
+<li>In cPanel go to File Manager and navigate to the place you want on your
+server to serve Yioop from. Click upload and choose your zip file so as to
+upload it to that location.</li>
+<li>Select the uploaded file and click extract to extract the zip file to a
+folder. Reload the page. Rename the extracted folder, if necessary.
+</li>
+<li>For the rest of these instructions, let's assume it was mysite
+where the testing is being done. If at this point one browsed to:
+<pre>
+http://mysite.my/yioop/
+</pre>
+One would see:
+<pre>
+SERVICE AVAILABLE ONLY VIA LOCALHOST UNTIL CONFIGURED
+</pre>
+Browse to the yioop/configs folder. Create a new file local_config.php
+Add the code
+<pre>
+&lt;?php
+define('NO_LOCAL_CHECK', 'true');
+?&gt;
+</pre>
+Now if you browse to:
+<pre>
+http://mysite.my/yioop/
+</pre>
+you should see a place to enter a work directory path.
+</li>
+<li>The work directory must be an absolute path. In the cPanel FileManager
+next at the top
+of the directory tree in the left hand side of the screen it lists the file
+path such as
+<pre>
+/public_html/mysite.my/yioop/configs
+</pre>
+(if we still happened to be in the configs directory).
+You want to make this a full path. Typically, this means tacking on
+/home/username (what you log in with) to the path so far.
+To keep things simple set the work directory to be:
+<pre>
+/home/username/public_html/mysite.my/yioop_data
+</pre>
+Here username should be your user name. After filling in this as the
+Work Directoryclick Load or Create. You will see it briefly display a
+complete profile page then log you out saying you must login with username
+root password blank Re-Login.
+</li>
+<li>Go to Manage account and give yourself a better login and password.</li>
+<li><p>Go to configure. Many cPanel installation still use PHP 5.2 so you might
+see:
+<pre>
+The following required items were missing:
+PHP Version 5.3 or Newer
+</pre>
+This means you won't be able to crawl from within cPanel, but you will still be
+able to serve search results. To do this, perform a crawl elsewhere,
+for instance on your laptop.</li>
+<li>After performing a crawl, go to Manage Crawls
+on the machine where you preformed the crawl.
+Look under Previous Crawls and locate the crawl you want to upload.
+Note its timestamp.</li>
+<li>Go to THIS_MACHINES_WORK_DIRECTORY/cache . Locate the folder
+IndexDatatimestamp. where timestamp is the timestamp of the crawl you want.
+ZIP this folder.</li>
+<li>In FileManager, under cPanel on the machine you want to host your crawl,
+navigate to
+<pre>
+yioop_data/cache.
+</pre>
+Upload the ZIP and extract it.</li>
+<li>Go to Manage Crawls on this instance of Yioop,
+locate this crawl under Previous Crawls and set it as the default crawl.
+You should now be able to search and get results from the crawl.
+</li>
+</ol>
+<p>
+You will probably want to uncheck Cache in the Configure activity as in this
+hosted setting it is somewhat hard to get the cache page feature of Yioop! to
+work.
+</p>
+
+<h2 id="multiple">System with Multiple Queue Servers</h2>
+<p>
+This section assumes you have already successfully installed and performed
+crawls with Yioop! in the single queue_server setting and have succeeded to use
+the Manage Machines to start and stop a queue_server and fetcher. If not, you
+should consult one of the installation guides above or the general
+<a href="http://localhost/git/seek_quarry/?c=main&p=documentation">Yioop
+Documentation</a>.
+</p>
+<p>
+Before we begin, what are the advantages in using more than one queue_server?
+</p>
+<ol>
+<li>If the queue_servers are running on different processors then they can each
+be indexing part of the crawl data independently and so this can speed up
+indexing</li>
+<li>After the crawl is done, the index will typically exist on multiple
+machines and each needs to search a smaller amount of data before sending it to
+the name server for final merging. So queries can be faster.</li>
+</ol>
+<p>
+For the purposes of this post we will consider the case of two queue_servers,
+the same idea works for more. To keep things especially simple, we have both of
+ these queue_servers on the same laptop. Advantages
+(1) and (2) will likely not apply in this case, but we are describing this
+for testing purposes -- you can take the same idea and have the queue servers
+on different machines after going through this tutorial.
+</p>
+
+<ol>
+<li>Download and install yioop as you would in the single queue_server case.
+But do this twice. For example, on your machine, under document root you
+might have two subfolders
+<pre>
+git/yioop1
+</pre>
+and
+<pre>
+git/yioop2
+</pre>
+each with a complete copy of yioop.
+We will use the copy git/yioop1 as an instance of Yioop with both a name_server
+and a queue_server; the git/yioop2 will be an instance with just a
+queue_server.
+</li>
+<li>
+On the Configure element of the git/yioop1 instance, set the work directory
+to be something like
+<pre>
+/Applications/XAMPP/xamppfiles/htdocs/crawls1
+<pre>
+for the git/yioop2 instance we set it to be
+<pre>
+/Applications/XAMPP/xamppfiles/htdocs/crawls2
+</pre>
+i.e., the work directories of these two instances should be different!
+For each crawl in the multiple queue_server setting, each instance will
+have a copy of those documents it is responsible for. So if we did a crawl with
+timestamp 10, each instance would have a WORK_DIR/cache/IndexData10
+folder and these folders would be disjoint in its contents from any other
+instance.
+</li>
+<li>
+Continuing down on the Configure element for each instance, make sure under the
+Search Access fieldset Web, RSS, and API are checked.</li>
+<li>Next make sure the name server and server key are the same for both
+instances. i.e., In the Name Server Set-up fieldset, one might set:
+<pre>
+Server Key:123
+Name Server URL:http://localhost/git/yioop1/
+</pre>
+</li>
+<li>
+The Crawl Robot Name should also be the same for the two instances, say:
+<pre>
+TestBotFeelFreeToBan
+<pre>
+but we want the robot instance to be different, say 1 and 2.
+</li>
+<li>Go to the Manage Machine element for git/yioop1, which is the name server.
+Only the name server needs to manage machines,
+so we won't do this for git/yioop2 (or for any other queue servers
+if we had them).</li>
+<li>Add machines for each yioop instance we want to manage with the name server.
+In this particular case, fill out and submit the Add Machine form twice,
+the first time with:
+<pre>
+Machine Name:Local1
+Machine Url:http://localhost/git/yioop1/
+Is Mirror: unchecked
+Has Queue Server: checked
+Num Fetchers: 1
+</pre>
+the second time with:
+<pre>
+Machine Name:Local2
+Machine Url:http://localhost/git/yioop2/
+Is Mirror: unchecked
+Has Queue Server: checked
+Num Fetchers: 1
+</pre>
+The Machine Name should be different for each Yioop instance, but can otherwise
+be whatever you want. Is Mirror controls whether this is a replica of some other
+node -- I'll save that for a different install guide at some point. If we
+wanted to run more fetchers  we could have chosen a bigger number for
+Num Fetchers (fetchers are the processes that download web pages).
+</li>
+<li>
+After the above steps, there should be two machines listed under
+Machine Information.  Click the On button on the queue server and the
+fetcher of both of them. They  should turn green. If you click the log link
+you should start seeing new  messages (it refreshes once every 30 secs) after
+at most a minute or so.
+</li>
+<li>
+At this point you are ready to crawl in the multiple queue server setting. You
+can use Manage Crawl to set-up, start and stop a crawl exactly as in the single
+queue_server setting.
+</li>
+<li>
+Perform a crawl and set it as the default index. You can
+then turn off all the queue servers and fetchers in Manage Machines, if you
+like.</li>
+<li>
+If you type a query into the search bar of the name server (git/yioop1),
+you should be getting merged results from both queue servers. To check
+if this is working... Under configure on the name server (git/yioop1) make sure
+Query Info is checked and that
+Use Memcache and Use FileCache are not checked -- the latter two are for
+testing, we can check them later when we know things are working. When you
+perform a query now, at the bottom of the page you should see a horizontal
+rule followed by Query Statistics followed
+by all the queries performed in calculating results. One of these should be
+PHRASE QUERY. Underneath it you should see Lookup Offset Times and beneath this
+Machine Subtimes: ID_0 and ID_1. If these appear you know its working.
+</li>
+</ol>
+<p>When a query is typed into the name server it tacks no:network onto it
+and asks it of all the queue servers, then merges the results.
+So if you type "hello" as the search, i.e., if you go to the url
+<pre>
+http://localhost/git/yioop1/?q=hello
+</pre>
+the git/yioop1 script will make in parallel the curl requests
+<pre>
+http://localhost/git/yioop1/?q=hello&ne ... alse&raw=1 (raw=1 means no grouping)
+http://localhost/git/yioop2/?q=hello&ne ... alse&raw=1
+</pre>
+get the results back and merge them and finally return to the user the result.
+The network=false tells http://localhost/git/yioop1/ to actually do the query
+lookup rather than make a network request.
+</p>
diff --git a/en-US/pages/resources.thtml b/en-US/pages/resources.thtml
index 7d0bcca..26f3674 100755
--- a/en-US/pages/resources.thtml
+++ b/en-US/pages/resources.thtml
@@ -1,7 +1,8 @@
 <h1>Resources</h1>
 <ul>
 <li><a href="/phpBB/">Discussion Boards</a></li>
+<li><a href="?c=main&amp;p=install">Install Guides</a></li>
 <li><a href="/mantis/">Issue Tracking</a></li>
-<li><a href="/viewgit/">View Git of Yioop repository</a></li>
 <li><a href="/yioop-docs/">PHPDocumentor docs for Yioop source code</a></li>
+<li><a href="/viewgit/">View Git of Yioop repository</a></li>
 </ul>

ViewGit