Add install instructions for HipHop and Debian, a=chris

Chris Pollett [2013-04-13 21:Apr:th]
Add install instructions for HipHop and Debian, a=chris
Filename
en-US/pages/documentation.thtml
en-US/pages/downloads.thtml
en-US/pages/install.thtml
en-US/pages/resources.thtml
diff --git a/en-US/pages/documentation.thtml b/en-US/pages/documentation.thtml
index a3340c1..f4dbf5e 100755
--- a/en-US/pages/documentation.thtml
+++ b/en-US/pages/documentation.thtml
@@ -507,7 +507,7 @@ files and both of these must be changed.</p>
     sudo apt-get install php5-curl
     sudo apt-get install php5-gd
     </pre>
-    <p>For both Mac and Linux, you need to alter the post_max_size
+    <p>For both Mac and Linux, you might want to alter the post_max_size
     variable in your php.ini file as in the Windows case above.</p>
     <p>In addition to the minimum installation requirements above, if
     you want to use the Manage Machines feature in Yioop, you might need
diff --git a/en-US/pages/downloads.thtml b/en-US/pages/downloads.thtml
index 82ac2df..f018cd1 100755
--- a/en-US/pages/downloads.thtml
+++ b/en-US/pages/downloads.thtml
@@ -2,9 +2,9 @@
 <h2>Yioop Releases</h2>
 <p>The two most recent versions of Yioop are:</p>
 <ul>
-<li><a href="http://www.seekquarry.com/viewgit/?a=archive&amp;p=yioop&h=b147860d56e941ba2925036589c08c5d380ec71d&amp;
-hb=f7b96d54b1c35ff6dabaee3e832d13b6e816bb35&amp;t=zip"
-    >Version 0.94-ZIP</a></li>
+<li><a href="http://www.seekquarry.com/viewgit/?a=archive&amp;p=yioop&amp;h=714e33c174a3201c0b35118df05faeaccf71c34a&amp;
+hb=ba6ab2a825d58af3fa7465ae26bdc9e292a49468&amp;t=zip"
+    >Version 0.941-ZIP</a></li>
 <li><a href="http://www.seekquarry.com/viewgit/?a=archive&p=yioop&amp;h=da73fb8ad24ba67201a3cccaa6290d711f505ef3&amp;
 hb=fb79c4c0b11379bee3b8c4c803f9f938a9001c16&amp;t=zip"
     >Version 0.921-ZIP</a></li>
diff --git a/en-US/pages/install.thtml b/en-US/pages/install.thtml
index 7f1a677..daab407 100755
--- a/en-US/pages/install.thtml
+++ b/en-US/pages/install.thtml
@@ -3,9 +3,10 @@
         <li><a href="#xampp">XAMPP on Windows</a></li>
         <li><a href="#wamp">WAMP</a></li>
         <li><a href="#osx">Mac OSX / Mac OSX Server</a></li>
-        <li><a href="#ubuntu">Ubuntu Linux</a></li>
+        <li><a href="#ubuntu">Ubuntu Linux / Debian</a></li>
         <li><a href="#centos">Centos Linux (Systems with SELinux)</a></li>
         <li><a href="#cpanel">CPanel</a></li>
+        <li><a href="#hiphop">HipHop</a></li>
         <li><a href="#multiple">Systems with Multiple Queue Servers</a></li>
     </ul>

@@ -326,6 +327,22 @@ post_max_size = 32M
 </pre>
 This change is not strictly necessary, but will improve performance.
 </li>
+<li>Debian's  (not Ubuntu's) PHP version has the
+suhosin hardening patch enabled by default. On Yioop version before
+0.941, this caused problems because Yioop made mt_srand calls which were
+ignored. To fix this you should add to
+the end of both php.ini files list above (alternatively, you could
+add to /etc/php5/apache2/conf.d/suhosin.ini and
+/etc/php5/cli/conf.d/suhosin.ini):
+<pre>
+suhosin.srand.ignore = Off
+suhosin.mt_srand.ignore = Off
+</pre>
+This modification is not needed for Version 0.941 and higher.
+Suhosin hardening also entails a second place where HTTP post requests
+are limited. You should also set suhosin.post.max_value_length to the
+same value you set for post_max_size.
+</li>
 <li>Looking in the folders /etc/php5/apache2/conf.d and
 /etc/php5/cli/conf.d you can see which extensions are being loaded
 by php. Look for files curl.ini, gd.ini, sqlite.ini to know these
@@ -349,6 +366,12 @@ sudo mkdir /var/www/yioop_data
 sudo chmod 777 /var/www/yioop_data
 </pre>
 </li>
+<li>Next set the permissions on the configs.php so that the web server
+can change set the work directory location. We'll brute force this as:
+<pre>
+sudo chmod 777 /var/www/yioop/configs/config.php
+</pre>
+</li>
 <li>In a browser, go to the page http://localhost/yioop/ .
 You should see a configure screen
 where you can enter /var/www/yioop_data for the Work Directory. It
@@ -458,7 +481,7 @@ php configure_tool.php
 select option (1) Create/Set Work Directory
 enter /var/www/html/yioop_data
 then select option (1) to confirm the change.
-exit program
+Exit the program.
 </pre>
 </li>
 <li>In a browser, go to the page http://localhost/yioop/ .
@@ -595,6 +618,193 @@ hosted setting it is somewhat hard to get the cache page feature  (where
 it let's users see complete caches of web-page by clicking a link) of Yioop to
 work.
 </p>
+<h2 id="hiphop">HipHop</h2>
+<p><a href="https://github.com/facebook/hiphop-php/wiki">HipHop</a>
+is Facebook's open-source virtual machine for executing PHP.
+It can offer a significant speed-up in performance over running
+the traditional PHP interpreter. Yioop runs under HipHop
+with the following limitations: (1) The Yioop page processors
+for epub, pptx, and xslx files make use of the ZipArchive class not
+supported by HipHop. (2) The Yioop recipe plugin makes use of SplHeap
+not supported by HipHop. In the former case you should uncheck these
+file extensions in Page Options. In the latter case, Yioop will automatically
+disable the recipe plugin, so you don't need to make any changes --
+if you had crawled something using this plugin elsewhere, you can still serve
+the results using HipHop though. For the remainder of this section,
+we describe how to get a Yioop up and running under Ubuntu 12.04 LTS using
+HipHop.
+</p>
+<ul>
+<li>To begin, get HipHop from GitHub. To do this add the hiphop repository
+to the apt-get sources list, add to the file /etc/apt/sources.list the line:
+<pre>
+deb http://dl.hiphop-php.com/ubuntu precise main
+</pre>
+then update the package index:
+<pre>
+sudo apt-get update
+</pre>
+and install HipHop using apt-get:
+<pre>
+sudo apt-get install hiphop-php
+</pre>
+</li>
+<li>Set up a HipHop configuration file /etc/hhvm.hdf:
+<pre>
+Server {
+  Port = 8080
+  SourceRoot = /var/www/yioop
+}
+
+Eval {
+  Jit = true
+}
+Log {
+  Level = Error
+  UseLogFile = true
+  File = /var/log/hhvm/error.log
+  Access {
+    * {
+      File = /var/log/hhvm/access.log
+      Format = %h %l %u %t \"%r\" %>s %b
+    }
+  }
+}
+
+VirtualHost {
+  * {
+    Pattern = .*
+    RewriteRules {
+      dirindex {
+        pattern = ^/(.*)/$|^/$
+        to = $1/index.php
+        qsa = true
+      }
+    }
+  }
+}
+
+StaticFile {
+  FilesMatch {
+    * {
+      pattern = .*\.(dll|exe)
+      headers {
+        * = Content-Disposition: attachment
+      }
+    }
+  }
+  Extensions {
+    css = text/css
+    gif = image/gif
+    html = text/html
+    jpe = image/jpeg
+    jpeg = image/jpeg
+    jpg = image/jpeg
+    png = image/png
+    tif = image/tiff
+    tiff = image/tiff
+    txt = text/plain
+  }
+}
+</pre>
+Notice this is running on port 8080 -- when I was testing this, I had something
+else running on port 80. If you want to use the more common port 80, modify
+the above accordingly. For the purposes of figuring out when error
+issues it is often convenient to look at the error.log file by running:
+<pre>
+tail -n 500 /var/log/hhvm/error.log
+</pre>
+This is the location specified by the configuration file; however, the directory
+/var/log/hhvm does not exist by default so you should create it:
+<pre>
+sudo mkdir /var/log/hhvm
+</pre>
+Mostof the configuration file above comes from
+the <a href="http://www.hiphop-php.com/wp/?p=113">HipHop Blog Entry for
+WordPress Installation</a>. I tweaked the rewrite for what are the default
+index files.
+</li>
+<li>Start the HipHop virtual machine daemon:
+<pre>
+sudo hhvm --mode daemon --user web --config /etc/hhvm.hdf
+</pre>
+</li>
+<li><a href="http://www.seekquarry.com/viewgit/?a=summary&amp;p=yioop"
+>Download Yioop</a>, unpack it into /var/www . If you didn't
+install apache2 then you might need to do mkdir to make this folder. Next use
+mv to rename the Yioop folder to yioop.</li>
+<li>Make a folder for your crawl data:
+<pre>
+sudo mkdir /var/www/yioop_data
+sudo chmod 777 /var/www/yioop_data
+</pre>
+</li>
+<li>Tell Yioop where its work directory is:
+<pre>
+cd /var/www/html/yioop/configs
+sudo hhvm -f configure_tool.php
+
+select option (1) Create/Set Work Directory
+enter /var/www/html/yioop_data
+then select option (1) to confirm the change.
+Exit the program.
+</pre>
+Notice to run the PHP program above we did not have to install php,
+we just directly ran it using HipHop from the command line. The -f
+option is to say the file name we'd like to run.
+</li>
+<li>In a browser, go to the page http://localhost:8080/ .
+You should see a configure screen
+where you can enter /var/www/yioop_data for the Work Directory. It
+will ask you to re-login. Use the login: root and no password.
+You can safely ignore the warning:
+<pre>
+The following required items were missing: PHP Version 5.3 or Newer
+</pre>
+Now go to Yioop =&gt;
+Configure and input the following settings:
+<pre>
+Search Engine Work Directory: /Library/WebServer/Documents/yioop_data
+Default Language: English
+Crawl Robot Name: TestBot
+Robot Description: This bot is for test purposes. It respects robots.txt
+If you having problems with it please feel free to ban it.
+</pre>
+Crawl robot name is what will appear together with a url to a bot.php
+page in web server log files of sites you crawl. The bot.php page will display
+what you write in robot description. This should give contact information
+in case your robot misbehaves. Obviously, you should customize
+the above to what you want to say.
+</li>
+<li>Click [Toggle Advanced Settings] on the configure page. For
+the Name Server URL set it to:
+<pre>
+http://localhost:8080/
+</pre>
+If you didn't use port 8080, but instead the usual port 80, you would not
+have to do this step.
+</li>
+<li>Go to Manage Machines. Add a single machine under Add Machine using the
+settings:
+<pre>
+Machine Name: Local
+Machine Url: http://localhost:8080/
+Is Mirror: (uncheck)
+Has Queue Server: (check)
+Number of Fetchers 1
+Submit
+</pre>
+</li>
+<li>Under Machine Information turn the Queue Server and Fetcher On.</li>
+<li>Go to Manage Crawls. Click on the options to set up where you want to crawl.
+Type in a name for the crawl and click start crawl.</li>
+<li>Let it crawl for a while, until you see the Total URLs Seen &gt; 1.</li>
+<li>Then click Stop Crawl and wait for the crawl to appear in the previous
+crawls list. Set it as the default crawl. You should be
+able to search using this index.
+</li>
+<li>If you prefer to run the </li>
+</ul>

 <h2 id="multiple">Systems with Multiple Queue Servers</h2>
 <p>
diff --git a/en-US/pages/resources.thtml b/en-US/pages/resources.thtml
index 31d054c..d60747e 100755
--- a/en-US/pages/resources.thtml
+++ b/en-US/pages/resources.thtml
@@ -2,6 +2,7 @@
 <ul>
 <li><a href="http://www.seekquarry.com/phpBB/">Discussion Boards</a></li>
 <li><a href="?c=main&amp;p=install">Install Guides</a></li>
+<li><a href="?c=main&amp;p=ranking">Yioop Ranking Mechanisms</a></li>
 <li><a href="?c=main&amp;p=coding">Coding Guidelines</a></li>
 <li><a href="http://www.seekquarry.com/mantis/">Issue Tracking</a></li>
 <li><a href="http://www.seekquarry.com/yioop-docs/"
ViewGit