Version 0.92 less final accurate 0.92 link on download page, a=chris

Chris Pollett [2013-01-05 00:Jan:th]

Version 0.92 less final accurate 0.92 link on download page, a=chris

Filename
en-US/pages/about.thtml
en-US/pages/documentation.thtml
en-US/pages/downloads.thtml
en-US/pages/install.thtml

diff --git a/en-US/pages/about.thtml b/en-US/pages/about.thtml
index 5119429..4ed47a6 100755
--- a/en-US/pages/about.thtml
+++ b/en-US/pages/about.thtml
@@ -7,8 +7,8 @@ began in Nov. 2009 and had its first publically available release in August,
 </p>

 <h1>The Yioop and SeekQuarry Names</h1>
-<p>When looking for names for my search engine I was originally
-thinking about using the name SeekQuarry which hadn't been
+<p>When looking for names for my search engine, I was originally
+thinking about using the name SeekQuarry whose domain name hadn't been
 registered. After deciding that I would use Yioop for the name
 of my search engine site, I decided I would use SeekQuarry as a
 site to publish the software that is used in the Yioop engine.
@@ -34,7 +34,7 @@ Page View Statistics</a>.
 <a href="http://en.wikipedia.org/wiki/Trie">Trie</a>'s for word suggestion
 for all languages other than Vietnamese were built
 using the <a href="http://en.wiktionary.org/wiki/Wiktionary:Frequency_lists"
->Wiktionary Frequency List</a>. These are available under a
+>Wiktionary Frequency Lists</a>. These are available under a
 <a href="http://creativecommons.org/licenses/by-sa/3.0/">Creative
 Commons Share Alike 3.0 Unported License</a> as described on <a
 href="http://en.wikipedia.org/wiki/Wikipedia:Database_download">Wikipedia's
@@ -59,28 +59,36 @@ Ahmed Kamel Taha, and Sugi Widjaja. Thanks to Ravi Dhillon, Akshat Kukreti,
 Tanmayee Potluri, Shawn Tice, and Sandhya Vissapragada for
 creating patches for Yioop issues. Several of my master's students have done
 projects related to Yioop: Amith Chandranna, Priya Gangaraju,
-Vijaya Pamidi, Vijeth Patil, and Vijaya Sinha. Amith's code related to an
-Online version of the HITs algorithm is not currently in the main branch of
+Vijaya Pamidi, Vijeth Patil, Vijaya Sinha, Tarun Pepira, Tanmayee Potluri, and
+Sandhya Vissapragada. Amith's code related to an
+Online version of the HITs algorithm. It is not currently in the main branch of
 Yioop, but it is obtainable from
-<a href="http://www.cs.sjsu.edu/faculty/pollett/masters/
-Semesters/Spring10/amith/index.shtml">Amith Chandranna's student page</a>.
+<a href="http://www.cs.sjsu.edu/faculty/pollett/masters/Semesters/
+Spring10/amith/index.shtml">Amith Chandranna's student page</a>.
 Vijaya Pamidi developed a Firefox web traffic extension for Yioop.
 Her code is also obtainable from <a href="http://www.cs.sjsu.edu/faculty/
 pollett/masters/Semesters/Fall10/vijaya/index.shtml">Vijaya Pamidi's
-master's pages</a>. <a href="http://www.cs.sjsu.edu/faculty/pollett/
+master's pages</a>. Her project was later extended by
+<a href="http://www.cs.sjsu.edu/faculty/pollett/masters/Semesters/Fall11/tarun/
+index.shtml?Bio.shtml#top">Tarun Ramaswamy</a>. Neither of these projects
+is currently in the main Yioop repository. <a href="http://www.cs.sjsu.edu/
+faculty/pollett/
 masters/Semesters/Fall11/vijeth/index.shtml">Vijeth Patil's Project</a>
 involved adding support for Twitter and RSS feeds to add additional real-time
 search results to the standard search results. This is not currently in main
+repository. <a href="http://www.cs.sjsu.edu/faculty/pollett/masters/Semesters/
+Fall11/tanmayee/index.shtml">Tanmayee Potluri's Project</a> added
+log and database archive iterators for Yioop. It is currently not in the main
 branch. <a href="http://www.cs.sjsu.edu/faculty/pollett/
 masters/Semesters/Spring11/amith/index.shtml">Vijaya Sinha's Project</a>
-concerned using Open Street Map data in Yioop. This code is not currently
-in the main branch. Priya's code served as the
+concerned using Open Street Map data in Yioop. This code is also not currently
+in the main branch. Priya Gangaraju's code served as the
 basis for the plugin feature currently in Yioop. Shawn Tice's CS288
 project served as the basis of a rewrite of the archive crawl feature of Yioop
 for the multi-queue server setting. Sandhya Vissapragada's Master project served
 as the basis for the autosuggest and spell checking functionality in Yioop.
 The following other students have created text processors for Yioop: Nakul
-Natu (pptx), Vijeth Patil (epub), and Tarun Pepira (xslx). Akshat Kukreti
+Natu (pptx), Vijeth Patil (epub), and Tarun Ramaswamy (xslx). Akshat Kukreti
 created the Italian language stemmer based on the Snowball version at
-<a href="http://tartarus.org">http://tartarus.org</a>.
+<a href="http://tartarus.org/">http://tartarus.org/</a>.
 </p>
diff --git a/en-US/pages/documentation.thtml b/en-US/pages/documentation.thtml
index f7ea6de..d43a69a 100755
--- a/en-US/pages/documentation.thtml
+++ b/en-US/pages/documentation.thtml
@@ -26,7 +26,7 @@
         <li><a href="#references">References</a></li>
     </ul>
     <h2 id="quick">Preface: Quick Start Guides</h2>
-    <p>This document serves as a detailed description of the
+    <p>This document serves as a detailed reference for the
     Yioop search engine. If you want to get started using Yioop now,
     but perhaps in less detail, you might want to first read the
     <a href="?c=main&p=install">Installation
@@ -149,9 +149,10 @@
     and the Sawzall language [<a href="#PDGQ2006">PDGQ2006</a>] were built to
     make these multi-round
     distributed computation tasks easier. In the open source community,
-    the <a href="http://hadoop.apache.org/hdfs/"
+    the <a href="http://hadoop.apache.org/docs/hdfs/current/hdfs_design.html"
     >Hadoop Distributed File System</a>,
-    <a href="http://hadoop.apache.org/mapreduce">Hadoop MapReduce</a>,
+    <a href="http://hadoop.apache.org/docs/mapreduce/current/index.html"
+    >Hadoop MapReduce</a>,
     and <a href="http://hadoop.apache.org/pig/">Pig</a> play an analogous role
     [<a href="#W2009">W2009</a>]. Recently, a theoretical framework
     for what algorithms can be carried out as rounds of map inputs to
@@ -282,7 +283,7 @@
     href="http://nutch.apache.org/">Nutch</a>/
     <a href="http://lucene.apache.org/">Lucene</a>/ <a
     href="http://lucene.apache.org/solr/">Solr</a>
-    [<a href="KC2004">KC2004</a>], <a href="http://www.yacy.net/">YaCy</a>,
+    [<a href="#KC2004">KC2004</a>], <a href="http://www.yacy.net/">YaCy</a>,
     and <a href="http://crawler.archive.org/">Heritrix</a>
     [<a href="#MKSR2004">MKSR2004</a>]. Nutch is the original application for
     which the Hadoop infrastructure described above was developed. Nutch
@@ -315,7 +316,7 @@
     http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml">WARC
     format</a> are often used by TREC conferences to store test data sets such
     as <a href="http://ir.dcs.gla.ac.uk/test_collections/">GOV2</a> and the
-    <a href="http://boston.lti.cs.cmu.edu/Data/clueweb09/">ClueWeb Dataset</a>.
+    <a href="http://lemurproject.org/clueweb09/">ClueWeb Dataset</a>.
     In addition, it was used by grub.org (hopefully, only on a
     temporary hiatus), a distributed, open-source, search engine project in C#.
     Another important format for archiving web pages is the XML format used by
@@ -1006,8 +1007,8 @@ you can click, or use the up down arrows to select one of these suggestion
 to also perform a search</p>
 <img src='resources/Autosuggest.png' alt='Example suggestions as you type'
 width="70%"/>
-<p>For some non-Roman alphabet scripts such as Telugu you can enter
-words using how they sound using Roman letters and get suggestions
+<p>For some non-roman alphabet scripts such as Telugu you can enter
+words using how they sound using roman letters and get suggestions
 in the script in question:</p>
 <img src='resources/TeluguAutosuggest.png' alt='Telugu suggestions for
 roman text' width="70%"/>
@@ -1062,7 +1063,7 @@ Clicking on the history toggle, produces the following interface:
 <img src='resources/CacheHistory.png' alt='Example Cache History UI'
 width="70%"/>
 <p>
-This let's you select different caches of the page in question.
+This lets you select different caches of the page in question.
 </p>
 <p> Clicking the "Toggle extracted summary" link  will show the title, summary,
 and links that were extracted from the full page and indexed. No other terms
@@ -1694,7 +1695,7 @@ http://www.facebook.com/###!Facebook###!A%20famous%20social%20media%20site
     </p>
     <h4 id="archive-crawl">Archive Crawl Options</h4>
     <p>We now consider how to do crawls of previously obtained archives.
-    From the initial crawl options screen clicking on the Archive Crawl
+    From the initial crawl options screen, clicking on the Archive Crawl
     tab gives one the following form:</p>
 <img src='resources/ArchiveCrawlOptions.png' alt='Archive Crawl Options Form'/>
     <p>The dropdown lists all previously done crawls that are available for
@@ -1703,7 +1704,7 @@ http://www.facebook.com/###!Facebook###!A%20famous%20social%20media%20site
     </p>These include both previously done Yioop crawls, previously
     down recrawls (prefixed with RECRAWL::), Yioop Crawl Mixes (prefixed with
     MIX::), and crawls
-    of other file formats such as arc, MediaWiki XML, and ODP RDF which
+    of other file formats such as: arc, MediaWiki XML, and ODP RDF, which
     have been appropriately prepared in the PROFILE_DIR/cache folder
     (prefixed with ARCFILE::).
     You might want to re-crawl an existing Yioop crawl if you want to add
@@ -1720,12 +1721,23 @@ http://www.facebook.com/###!Facebook###!A%20famous%20social%20media%20site
     You might want to do an archive crawl of other file formats
     if you want Yioop to be able to provide search results of their content.
     Once you have selected the archive you want to crawl, you can add meta
-    words as discussed in the previous section and then save your options
-    and go back to the Create Crawl screen to start your crawl. As with
-    a Web Crawl, for an archive crawl you need both the queue_server
-    running and a least one fetcher running to perform a crawl. To re-crawl
-    an archive that was made with several fetchers, each of the fetchers
-    that was used in the creation process should be running.</p>
+    words as discussed in the previous section and then save your options.
+    Afterwards, you go back to the Create Crawl screen to start your crawl.
+    As with a Web Crawl, for an archive crawl you need both the queue_server
+    running and a least one fetcher running to perform a crawl.</p>
+    <p>To re-crawl
+    a previously created web archive that was made using several fetchers,
+    each of the fetchers that was used in the creation process should be
+    running. This is because the data used in the recrawl will come locally
+    from the machine of that fetcher. For other kinds of archive crawls and mix
+    crawls, which fetchers one uses, doesn't matter because archive crawl data
+    comes through the name server. You might also notice that the number of
+    pages in a web archive re-crawl is actually larger than the initial
+    crawl. This can happen because during the initial crawl data was
+    stored in the fetcher's archive bundle and a partial index of this
+    data sent to appropriate queue_servers but was not yet processed by
+    these queue servers. So it was waiting in a schedules folder to be
+    processed in the event the crawl was resumed.</p>
     <p>To get Yioop to detect arc, MediaWiki, and ODP RDF files you need
     to create an PROFILE_DIR/cache/archives folder on the name
     server machine. Yioop checks subfolders of this for
@@ -2418,7 +2430,7 @@ var alpha = "aåàbcçdeéêfghiîïjklmnoôpqrstuûvwxyz";
     </p>
     <p><a href="#toc">Return to table of contents</a>.</p>
     <h2 id='embedding'>Embedding Yioop in an Existing Site</h2>
-    <p>One use-case for Yioop is to use it to serve search result for your
+    <p>One use-case for Yioop is to serve search result for your
     existing site. There are three common ways to do this: (1)
     On your site have a web-form or links with your installation of Yioop
     as their target and let Yioop format the results. (2) Use the
@@ -2915,13 +2927,13 @@ all of the Yioop meta words should work so you can do queries like
 kind of language stemmer/char-gramming being used, so French results might be
 better if one specifies fr-FR then if one relies on the default en-US.</p>
 <h3 id="code_tool"> A Tool for Coding and Making Patches for Yioop</h3>
-<p>bin/code_tool.php can perform several useful task to help developers
+<p>bin/code_tool.php can perform several useful tasks to help developers
 program for the Yioop environment. Below is a brief summary of its
 functionality:</p>
 <dl>
 <dt>php code_tool.php clean path</dt>
     <dd>Replaces all tabs with four spaces and trims all whitespace off ends of
-    lines in the folder or file path</dd>
+    lines in the folder or file path.</dd>

 <dt>php code_tool.php copyright path</dt><dd>
     Adjusts all lines in the files in the folder at path (or if
diff --git a/en-US/pages/downloads.thtml b/en-US/pages/downloads.thtml
index 5465ad4..3660de6 100755
--- a/en-US/pages/downloads.thtml
+++ b/en-US/pages/downloads.thtml
@@ -2,14 +2,13 @@
 <h2>Yioop Releases</h2>
 <p>The Yioop source code is still at an alpha stage. </p>
 <ul>
+<li><a href="http://www.seekquarry.com/viewgit/?a=archive&amp;p=yioop&amp;h=202a5c5401c983e43015111c634d1b853185a7b6&amp;
+hb=900183a5b581b5555ab3463f968096680eecefa3&amp;t=zip"
+    >Version 0.92-ZIP</a></li>
 <li><a href="http://www.seekquarry.com/viewgit/?
 a=archive&amp;p=yioop&amp;h=3ba7c0901b792891b6b279732e5184668b294e44&amp;
 hb=8b105749c471bbfe97df88e84df8f9c239027a01&amp;t=zip"
     >Version 0.90-ZIP</a></li>
-<li><a href="http://www.seekquarry.com/viewgit/?
-a=archive&amp;p=yioop&amp;h=1be2b50b8436998ce8d2d41f5db3b470610aa817&amp;
-hb=6fc863b1aaf26d8a0abf49a2aad9c7ce440ea307&amp;t=zip"
-    >Version 0.88-ZIP</a></li>
 </ul>
 <h2>Installation</h2>
 <p>The documentation page has information about the
@@ -17,13 +16,14 @@ hb=6fc863b1aaf26d8a0abf49a2aad9c7ce440ea307&amp;t=zip"
 >requirements</a> of and
 <a href="?c=main&amp;p=documentation#installation"
 >installation procedure</a> for Yioop. The
-<a href="?c=main&amp;p=install">Install Guides</a> page
-explains how to get Yioop work in some common settings.</p>
+<a href="?c=main&amp;p=install">Install Guides</a>
+explain how to get Yioop to work in some common settings.</p>
 <h2>Git Repository / Contributing</h2>
 <p>The Yioop git repository allows anonymous read-only access. If you would
 like to contribute to Yioop, just do a clone of the most recent code,
 make your changes, do a pull, and make a patch. For example, to clone the
-repository  assuming you have git, type:</p>
+repository, assuming you have the git version control software
+installed, just type:</p>
 <p><b>git clone https://seekquarry.com/git/yioop.git</b></p>
 <p>
 The <a href="?c=main&amp;p=coding">Yioop Coding Guidelines</a> explain
diff --git a/en-US/pages/install.thtml b/en-US/pages/install.thtml
index 663711d..aa0b646 100755
--- a/en-US/pages/install.thtml
+++ b/en-US/pages/install.thtml
@@ -4,9 +4,9 @@
         <li><a href="#wamp">WAMP</a></li>
         <li><a href="#osx">Mac OSX / Mac OSX Server</a></li>
         <li><a href="#ubuntu">Ubuntu Linux</a></li>
-        <li><a href="#centos">Centos Linux</a></li>
+        <li><a href="#centos">Centos Linux (Systems with SELinux)</a></li>
         <li><a href="#cpanel">CPanel</a></li>
-        <li><a href="#multiple">System with Multiple Queue Servers</a></li>
+        <li><a href="#multiple">Systems with Multiple Queue Servers</a></li>
     </ul>

 <h2 id="xampp">XAMPP on Windows</h2>
@@ -17,21 +17,21 @@
 <li>Download <a
     href="http://www.apachefriends.org/en/xampp-windows.html">Xampp</a>
 (Note: Yioop! 0.9 or higher works on latest version;
-Yioop! 0.88 or lower works up till Xampp 1.7.7)</li>
-<li>Install xampp</li>
-<li>Copy PsExec from the pstools zip folder to C:\xampp\php</li>
-<li>Open control panel. Go to System =&gt; Advanced system settings =&gt;
+Yioop! 0.88 or lower works up till Xampp 1.7.7).</li>
+<li>Install xampp.</li>
+<li>Copy PsExec from the pstools zip folder to C:\xampp\php .</li>
+<li>Open Control Panel. Go to System =&gt; Advanced system settings =&gt;
 Advanced. Click on Environment Variables. Look under System Variables and
-select Path. Click Edit. Tack onto the end of the Variable Values:
+select Path. Click Edit. Tack onto the end of Variable Values:
 <pre>
 ;C:\xampp\php;
 </pre>
-Click OK a bunch times to get rid of windows. Close the control panel window.
+Click OK a bunch times to get rid of windows. Close the Control Panel window.
 Reopen it and go to the same place to make sure the path variable really
 was changed.
 </li>
-<li>Edit the file C:\xampp\php\php.ini in Notepad. Search on curl:
-change the line:
+<li>Edit the file C:\xampp\php\php.ini in Notepad. Search on curl.
+Change the line:
 <pre>
 ;extension=php_curl.dll
 </pre>
@@ -47,10 +47,10 @@ to
 <pre>
 post_max_size = 32M
 </pre>
-Start Apache. This change is not strictly necessary, but will improve
-performance.</li>
+Start Apache. The post_max_size change is not strictly necessary,
+but will improve performance.</li>
 <li>Download <a href="http://www.seekquarry.com/viewgit/?a=summary&amp;p=yioop"
->Yioop!</a> (you should choose some version &gt; 0.88 or latest)
+>Yioop</a> (You should choose a version &gt; 0.88 or the latest version).
 Unzip it into
 <pre>
 C:\xampp\htdocs
@@ -63,9 +63,9 @@ Point yout browser at:
 <pre>
 http://localhost/yioop/
 </pre>
-enter under "Search Engine Work Directory", the path
+Enter under "Search Engine Work Directory", the path
 <pre>
-C:/xampp/htdocs/yioop_data
+C:/xampp/htdocs/yioop_data
 </pre>
 It will ask you to log into Yioop. Login with username root and empty password.
 </li>
@@ -87,7 +87,7 @@ the installer of Yioop did not customize this to his system.
 Please block this ip.
 </pre>
 </li>
-<li>Go to Manage Machines and added a single machine under Add Machine:
+<li>Go to Manage Machines and add a single machine under Add Machine:
 <pre>
 Machine Name: Local
 Machine Url: http://localhost/yioop/
@@ -97,17 +97,17 @@ Number of Fetchers 1
 Submit
 </pre>
 </li>
-<li>You might need to restart the machine to get the next steps to work</li>
+<li>You might need to restart your computer to get the next steps to work.</li>
 <li>In Manage Machines, click ON on the queue server and on your fetcher.
-For each click
-on the log file and make sure that after at most two minutes you are seeing log
-entries appear.</li>
+For your queue server and your fetcher, click
+on the log file link and make sure that after at most two minutes you are
+seeing new log entries.</li>
 <li>Now go to Manage Crawls. Click on Options.
-Set the options you would like for your crawl,
-click Save.</li>
-<li>Type name of the crawl and start crawl. Let it crawl for a while,
-till you see the Total URls Seen > 1.</li>
-<li>Click stop crawl and waited for the crawl to appear in the previous
+Set the options you would like for your crawl.
+Click Save.</li>
+<li>Type the name of the crawl and start crawl. Let it crawl for a while,
+until you see the Total URLs Seen &gt; 1.</li>
+<li>Click stop crawl and wait for the crawl to appear in the previous
 crawls list.Set it as the default crawl. Then you can search using this index.
 </li>
 </ol>
@@ -138,8 +138,8 @@ Unzip it into
 <pre>
 C:\wamp\www
 </pre>
-Rename the downloaded folder yioop (so now have
-a folderC:\wamp\www\yioop).</li>
+Rename the downloaded folder yioop (so you should now have
+a folder C:\wamp\www\yioop).</li>
 <li>Edit php.ini to enable multicurl and change the post_max_size. To do
 this use the Wamp dock tool and navigate to wamp =&gt; php =&gt; extension.
 Turn on curl. Next navigate to wamp =&gt; php =&gt; php.ini .
@@ -166,7 +166,7 @@ Click edit and add to the path variable:
 ;C:\wamp\bin\php\php5.3.10;
 </pre>
 Exit control panel, then re-enter to double check that path really was added
- to end</li>
+ to the end.</li>
 <li> Next go to
 wamp =&gt; apache =&gt; restart service. In a browser, go to
 http://localhost/yioop/ . You should see a configure screen
@@ -206,10 +206,10 @@ Submit
 <li>Under Machine Information turn the Queue Server and Fetcher On.</li>
 <li>Go to Manage Crawls. Click on the options to set up where you want to crawl.
 Type in a name for the crawl and click start crawl.</li>
-<li>Let it crawl for a while, till you see the Total URls Seen &gt; 1.</li>
+<li>Let it crawl for a while, until you see the Total URLs Seen &gt; 1.</li>
 <li>Then click Stop Crawl and wait for the crawl to appear in the previous
 crawls list. Set it as the default crawl. You should be
-able to search using this index
+able to search using this index.
 </li>
 </ol>

@@ -221,7 +221,7 @@ for earlier OSX versions. </p>
 <ol>
 <li>Turn on Apache with PHP enabled.
 <ul>
-<li><b>Not OSX Server:</b> Traditionally, (pre-Mountain Lion) OSX, one
+<li><b>Not OSX Server:</b> Traditionally, on (pre-Mountain Lion) OSX, one
 could go to Control Panel =&gt; Sharing, and turn on Web Sharing to
 get the web server running. This option was removed in Mountain Lion, however,
 from the command line (Terminal), one can type:
@@ -241,10 +241,10 @@ machine is turned on one can type:
 <br />By default, document root is
 /Library/WebServer/Documents. The configuration files for Apache in
 this setting are located in /etc/apache2. If you want to tweak document
-root or other apache settings, look in the folder /etc/apache2/other and
+root or other Apache settings, look in the folder /etc/apache2/other and
 edit appropriate files such as httpd-vhosts.conf or httpd-ssl.conf .
 Before turning on Web Sharing / the
-web server, you would want to edit the file /etc/apache/httpd.conf, replace
+web server, you need to edit the file /etc/apache/httpd.conf. Replace
 <pre>
 #LoadModule php5_module libexec/apache2/libphp5.so
 </pre>
@@ -254,7 +254,7 @@ LoadModule php5_module libexec/apache2/libphp5.so
 </pre>
 </li>
 <li><b>OSX Server:</b> Pre-mountain lion, OSX Server used /etc/apache2
-to store its configuration files, since Mountain Lion these files are in
+to store its configuration files. Since Mountain Lion these files are in
 /Library/Server/Web/Config/apache2 . Within this folder, the sites folder
 holds Apache directives for specific virtual hosts. OSX Server comes
 with Server.app which will actively fight any direct tweaking to configuration
@@ -268,7 +268,7 @@ be as you like.
 </li>
 <li>
 Modify the php.ini file, this is likely in the file /private/etc/php.ini.
-You want to change
+Change
 <pre>
 post_max_size = 8M
 to
@@ -289,7 +289,7 @@ sudo launchctl load -w /System/Library/LaunchDaemons/com.apple.atrun.plist
 <li>For the remainder of this guide, we assume document root for
 the web server is: /Library/WebServer/Documents.
 <a href="http://www.seekquarry.com/viewgit/?a=summary&amp;p=yioop"
->Download Yioop</a>, unpack it into /Library/WebServer/Documents and rename
+>Download Yioop</a>, unpack it into /Library/WebServer/Documents, and rename
 the Yioop folder to yioop.</li>
 <li>Make a folder for your crawl data:
 <pre>
@@ -297,17 +297,21 @@ sudo mkdir /Library/WebServer/Documents/yioop_data
 sudo chmod 777 /Library/WebServer/Documents/yioop_data
 </pre>
 You probably want to make sure Spotlight (Mac's built-in file and folder
-indexer) doesn't index this folder -- especially during a crawl -- or you
-system might really slow down. To prevent this open Control Panel, choose
+indexer) doesn't index this folder -- especially during a crawl -- or your
+system might really slow down. To prevent this, open Control Panel, choose
 Spotlight, select the Privacy tab, and add the above folder to the list
 of folder Spotlight shouldn't index. If you are storing crawls on an
 external drive, you might want to make sure that drive gets automounted
-without a login, in the event of a power failure that exceeds your backup power
-supply time. To do this you can write the preference:
+without a login. This is useful in the event of a power failure that exceeds
+your backup power supply time. To do this you can write the preference:
 <div><br /><tt>
 sudo defaults write /Library/Preferences/SystemConfiguration</tt
 ><tt>/autodiskmount AutomountDisksWithoutUserLogin -bool true</tt>
 </div><br />
+This will mean the hard drive becomes available when the power comes back.
+To make your Mac restart when the power is back, under
+System Preferences =&gt; Energy Saver there is
+a check box next to "Start up automatically after a power failure". Check it.
 </li>
 <li>In a browser, go to the page http://localhost/yioop/ .
 You should see a configure screen
@@ -346,7 +350,7 @@ Submit
 <li>Under Machine Information turn the Queue Server and Fetcher On.</li>
 <li>Go to Manage Crawls. Click on the options to set up where you want to crawl.
 Type in a name for the crawl and click start crawl.</li>
-<li>Let it crawl for a while, till you see the Total URls Seen &gt; 1.</li>
+<li>Let it crawl for a while, until you see the Total URLs Seen &gt; 1.</li>
 <li>Then click Stop Crawl and wait for the crawl to appear in the previous
 crawls list. Set it as the default crawl. You should be
 able to search using this index.
@@ -370,10 +374,10 @@ sudo apt-get install php5-gd
 <li>After this sequence, the files /etc/apache2/mods-enabled/php5.conf
 and /etc/apache2/mods-enabled/php5.load should exist and link
 to the corresponding files in /etc/apache2/mods-available. The configuration
-files for php are /etc/php5/apache2/php.ini (for the apache module)
+files for PHP are /etc/php5/apache2/php.ini (for the apache module)
 and /etc/php5/cli/php.ini (for the command-line interpreter).
 You want to make changes to both configurations. Using your favorite
-texteditor, vi, nano, gedit, etc., modify the line:
+texteditor, ed, vi, nano, gedit, etc., modify the line:
 <pre>
 post_max_size = 8M
 to
@@ -447,7 +451,7 @@ Submit
 <li>Under Machine Information turn the Queue Server and Fetcher On.</li>
 <li>Go to Manage Crawls. Click on the options to set up where you want to crawl.
 Type in a name for the crawl and click start crawl.</li>
-<li>Let it crawl for a while, till you see the Total URls Seen &gt; 1.</li>
+<li>Let it crawl for a while, until you see the Total URLs Seen &gt; 1.</li>
 <li>Then click Stop Crawl and wait for the crawl to appear in the previous
 crawls list. Set it as the default crawl. You should be
 able to search using this index.
@@ -455,16 +459,15 @@ able to search using this index.
 </ol>

 <h2 id="centos">Centos Linux</h2>
-<p>These instructions were tested running
+<p>These instructions were tested running a
 <a href="http://virtualboxes.org/images/centos/">Centos 6.3 image</a> in
 <a href="https://www.virtualbox.org/">VirtualBox</a>. The keyboard settings
 for the particular image on the VirtualBox site are Italian, so you will
 have to tweak them to get an American keyboard or the keyboard you are most
-comfortable with. To get started log in as user centos, and then
-launched a terminal window and su root.
+comfortable with. To get started, log in, launch a terminal window, and su root.
 </p>
 <ol>
-<li>The image doesn't have Apache installed or the nano editor.
+<li>The image we were using doesn't have Apache installed or the nano editor.
 These can be installed with the commands:<br />
 <pre>
 yum install httpd
@@ -476,7 +479,7 @@ under is in the list of sudoers.
 </li>
 <li>
 Apache's configuration files are in the /etc/httpd directory. To
-get of the default web landing page, we switch into the conf.d subfolder
+get rid of the default web landing page, we switch into the conf.d subfolder
 and disable welcome.conf. To do this, first type the commands:
 <pre>
 cd /etc/httpd/conf.d
@@ -487,11 +490,11 @@ Then using the editor put #'s at the start of each line and save the result.
 <li>Yioop needs to be able to issue shell commands to start and stop
 machines. In particular, it uses the "at daemon" (atd) to do this.
 The web server on Centos runs as user Apache and by default its shell is
-specified as noshell. Also, Centos makes use of SELinux and the domain
+specified as /sbin/nologin. Also, Centos makes use of SELinux and the domain
 under which Apache runs prevents it from issuing at commands as well.
 You probably want to use audit2allow and semanage to configure exactly
 the settings you need to get Yioop! to run. For the purposes of expediency
-however one can type:
+(maybe faster to get fired), however, one can type:
 <pre>
 usermod -s /bin/sh apache
 chcon -t unconfined_exec_t /usr/sbin/httpd
@@ -509,8 +512,8 @@ yum install php-gd
 </pre>
 </li>
 <li>The default Apache DocumentRoot under Centos is /var/www/html. We will
-install Yioop in a folder /var/www/html/yioop. This could be accessed
-by pointing a browser at http://127.0.0.1/yioop/
+install Yioop in a folder /var/www/html/yioop. This can be accessed
+by pointing a browser at http://127.0.0.1/yioop/ .
 To download Yioop to /var/www/html/yioop and to create a work directory,
 we run the commands:
 <pre>
@@ -575,7 +578,7 @@ Submit
 <li>Under Machine Information turn the Queue Server and Fetcher On.</li>
 <li>Go to Manage Crawls. Click on the options to set up where you want to crawl.
 Type in a name for the crawl and click start crawl.</li>
-<li>Let it crawl for a while, till you see the Total URls Seen &gt; 1.</li>
+<li>Let it crawl for a while, until you see the Total URLs Seen &gt; 1.</li>
 <li>Then click Stop Crawl and wait for the crawl to appear in the previous
 crawls list. Set it as the default crawl. You should be
 able to search using this index.
@@ -586,7 +589,7 @@ able to search using this index.
 <p>
 Generally, it is not practical to do your crawling in a cPanel hosted website.
 However, cPanel works perfectly fine for hosting the results of a crawl you did
-elsewhere. Here it is briefly described how to do this. In capacity planning,
+elsewhere. Here we briefly described how to do this. In capacity planning
 your installation, as a rule of thumb, you should
 expect your index to be of comparable size (number of bytes) to the sum of
 the sizes of the pages you downloaded.
@@ -666,17 +669,18 @@ yioop_data/cache.
 </pre>
 Upload the ZIP and extract it.</li>
 <li>Go to Manage Crawls on this instance of Yioop,
-locate this crawl under Previous Crawls and set it as the default crawl.
+locate this crawl under Previous Crawls, and set it as the default crawl.
 You should now be able to search and get results from the crawl.
 </li>
 </ol>
 <p>
 You will probably want to uncheck Cache in the Configure activity as in this
-hosted setting it is somewhat hard to get the cache page feature of Yioop to
+hosted setting it is somewhat hard to get the cache page feature  (where
+it let's users see complete caches of web-page by clicking a link) of Yioop to
 work.
 </p>

-<h2 id="multiple">System with Multiple Queue Servers</h2>
+<h2 id="multiple">Systems with Multiple Queue Servers</h2>
 <p>
 This section assumes you have already successfully installed and performed
 crawls with Yioop in the single queue_server setting and have succeeded to use
@@ -691,13 +695,13 @@ Before we begin, what are the advantages in using more than one queue_server?
 <ol>
 <li>If the queue_servers are running on different processors then they can each
 be indexing part of the crawl data independently and so this can speed up
-indexing</li>
+indexing.</li>
 <li>After the crawl is done, the index will typically exist on multiple
 machines and each needs to search a smaller amount of data before sending it to
 the name server for final merging. So queries can be faster.</li>
 </ol>
 <p>
-For the purposes of this post we will consider the case of two queue_servers,
+For the purposes of this note we will consider the case of two queue_servers,
 the same idea works for more. To keep things especially simple, we have both of
  these queue_servers on the same laptop. Advantages
 (1) and (2) will likely not apply in this case, but we are describing this
@@ -726,23 +730,23 @@ On the Configure element of the git/yioop1 instance, set the work directory
 to be something like
 <pre>
 /Applications/XAMPP/xamppfiles/htdocs/crawls1
-<pre>
-for the git/yioop2 instance we set it to be
+</pre>
+For the git/yioop2 instance we set it to be
 <pre>
 /Applications/XAMPP/xamppfiles/htdocs/crawls2
 </pre>
-i.e., the work directories of these two instances should be different!
+I.e., the work directories of these two instances should be different!
 For each crawl in the multiple queue_server setting, each instance will
 have a copy of those documents it is responsible for. So if we did a crawl with
 timestamp 10, each instance would have a WORK_DIR/cache/IndexData10
-folder and these folders would be disjoint in its contents from any other
+folder and these folders would be disjoint from any other
 instance.
 </li>
 <li>
 Continuing down on the Configure element for each instance, make sure under the
 Search Access fieldset Web, RSS, and API are checked.</li>
 <li>Next make sure the name server and server key are the same for both
-instances. i.e., In the Name Server Set-up fieldset, one might set:
+instances. I.e., In the Name Server Set-up fieldset, one might set:
 <pre>
 Server Key:123
 Name Server URL:http://localhost/git/yioop1/
@@ -752,14 +756,14 @@ Name Server URL:http://localhost/git/yioop1/
 The Crawl Robot Name should also be the same for the two instances, say:
 <pre>
 TestBotFeelFreeToBan
-<pre>
+</pre>
 but we want the robot instance to be different, say 1 and 2.
 </li>
 <li>Go to the Manage Machine element for git/yioop1, which is the name server.
 Only the name server needs to manage machines,
 so we won't do this for git/yioop2 (or for any other queue servers
 if we had them).</li>
-<li>Add machines for each yioop instance we want to manage with the name server.
+<li>Add machines for each Yioop instance we want to manage with the name server.
 In this particular case, fill out and submit the Add Machine form twice,
 the first time with:
 <pre>
@@ -787,7 +791,7 @@ Num Fetchers (fetchers are the processes that download web pages).
 After the above steps, there should be two machines listed under
 Machine Information.  Click the On button on the queue server and the
 fetcher of both of them. They  should turn green. If you click the log link
-you should start seeing new  messages (it refreshes once every 30 secs) after
+you should start seeing new  messages (it refreshes once every 30 seconds) after
 at most a minute or so.
 </li>
 <li>
@@ -814,7 +818,7 @@ Machine Subtimes: ID_0 and ID_1. If these appear you know its working.
 </li>
 </ol>
 <p>When a query is typed into the name server it tacks no:network onto it
-and asks it of all the queue servers, then merges the results.
+and asks it of all the queue servers. It then merges the results.
 So if you type "hello" as the search, i.e., if you go to the url
 <pre>
 http://localhost/git/yioop1/?q=hello
@@ -825,7 +829,7 @@ http://localhost/git/yioop1/?q=hello&amp;ne ... alse&amp;raw=1
     (raw=1 means no grouping)
 http://localhost/git/yioop2/?q=hello&amp;ne ... alse&amp;raw=1
 </pre>
-get the results back and merge them and finally return to the user the result.
-The network=false tells http://localhost/git/yioop1/ to actually do the query
-lookup rather than make a network request.
+get the results back, and merges them. Finally, it returns to the user the
+result. The network=false tells http://localhost/git/yioop1/ to actually do
+the query lookup rather than make a network request.
 </p>

ViewGit