Add documentation for other configuration settings, a=chris

Chris Pollett [2014-06-27 17:Jun:th]
Add documentation for other configuration settings, a=chris
Filename
en-US/pages/documentation.thtml
diff --git a/en-US/pages/documentation.thtml b/en-US/pages/documentation.thtml
index 9064e25..0ce5bdb 100755
--- a/en-US/pages/documentation.thtml
+++ b/en-US/pages/documentation.thtml
@@ -14,7 +14,7 @@
         <li><a href="#requirements">Requirements</a></li>
         <li><a href="#installation">Installation and Configuration</a></li>
         <li><a href="#optional">Optional Server and Security
-            Configurations</a></li
+            Configurations</a></li>
         <li><a href="#upgrade">Upgrading Yioop</a></li>
         <li><a href="#files">Summary of Files and Folders</a></li>
     </ul>
@@ -24,8 +24,8 @@
         <li><a href="#search-basic">Search Basics</a></li>
         <li><a href="#operators">Search Operators</a></li>
         <li><a href="#result-formats">Result Formats</a></li>
-        <li><a href="#settings">Settings</a></li>
         <li><a href="#tools">Search Tools Page</a></li>
+        <li><a href="#settings">Settings</a></li>
         <li><a href="#mobile">Mobile Interface</a></li>
     </ul>
     <li><a href="#social"><b>User Accounts and Social Features</b></a>
@@ -867,23 +867,108 @@ with a Yioop installation might have different user accounts set-up after
 changing database information you might have to sign in again.
 </p>

-<p>The <b>Account Registration</b> fieldset </p>
+<p>The <b>Account Registration</b> fieldset is used to control how
+user's can obtain accounts on a Yioop installation. The dropdown
+at the start of this fieldset allows one to select one of four possibilities:
+Disable Registration, users cannot register themselves, only the root account
+can add users; No Activation, user accounts are immediately activated
+once a user signs up; Email Activation, after registering, users must
+click on a link which comes in a separate email to activate their accounts;
+and Admin Activation, after registering, an admin account must activate the
+user before the user is allowed to use their account.
+When Disable Registration is selected, the Suggest A Url form and
+link on the tool.php page is disabled as well, for all other registration
+type this link is enabled. If Email Activation is chosen, then the
+reset of this fieldset can be used to specify the email address that
+the email comes to the user. The checkbox Use PHP mail() function controls
+whether to use the mail function in PHP to send the mail, this only works
+if mail can be sent from the local machine. Alternatively, if this is
+not checked like in the image above, one can configure an outgoing SMTP
+server to send the email through.</p>

-<p>The <b>Proxy Server</b> fieldset </p>
+<p>The <b>Proxy Server</b> fieldset is used to control which proxies to
+use while crawling. By default Yioop does not use any proxies while crawling.
+A Tor Proxy can serve as a gateway to the <a
+href="https://en.wikipedia.org/wiki/Tor_%28anonymity_network%29">Tor
+ Network</a>. Yioop can use this proxy to download .onion URLs on the Tor
+network. The configuration given in the example above works with the Tor
+Proxy that comes with the <a
+href="https://www.torproject.org/projects/torbrowser.html">Tor Browser</a>.
+Obviously, this proxy needs to be running though for Yioop to make use of it.
+Beneath the Tor Proxy input field is a checkbox labelled Crawl via Proxies.
+Checking this box, will reveal a textarea labelled Proxy Servers. You
+can enter the address:port or address:port:proxytype of proxy servers you
+would like to crawl through. If proxy servers are used, Yioop will make any
+requests to download pages to a randomly chosen server on the list which
+will proxy the request to the site which has the page to download. To some
+degree this can make the download site think the request is coming from a
+different ip (and potentially location) than it actually is. In practice,
+servers can often use HTTP headers to guess that a proxy is being used.
+</p>

 <p>The Security activity looks like:</p>
 <img src='resources/Security.png' alt='The Security Activity'/>
+<p>The <b>Authentication Type</b> fieldset is used to control the protocol
+used to log people into Yioop. This can either be Normal Authentication,
+passwords are checked against stored as salted hashes of the password; or
+ZKP (zero knowledge protocol) authentication, the server picks challenges
+at random and send these to the browser the person is logging in from,
+the browser computes based on the password an appropriate response according
+to the <a
+href="https://en.wikipedia.org/wiki/Feige<?php ?>
+%E2%80%93Fiat%E2%80%93Shamir_identification_scheme">Fiat Shamir protocol</a>.
+The password is never sent over the internet and is not stored on the server.
+These are the main advantages of ZKP, its drawback is that it is slower
+than Normal Authentication as to prove who you are with a low probability of
+error requires several browser-server exchanges. You should choose which
+authentication scheme you want before you create many users as if you switch
+everyone will need to get a new password.
+</p>
+<p>The <b>Captcha Type</b> fieldset controls what kind of
+<a href="https://en.wikipedia.org/wiki/Captcha">captcha</a> will be
+used during account registration, password recovery, and if a user wants
+to suggest a url. The captcha type only has an effect if under the
+Server Settings activity, Account Registration is not set to
+Disable Registration. The choices for captcha are: Text Captcha, the user has to
+select from a series of dropdown answers to questions of the form: Which in
+the following list is the most/largest/etc? or Which is the following list is
+the least/smallest/etc?; Graphic Captcha, the user needs to enter a sequence
+of characters from a distorted image; and hash captcha, the user's browser
+(the user doesn't need to do anything) needs to extend a random string with
+additional characters to get a string whose hash begins with a certain lead
+set of characters. Of these, Hash Captcha is probably the least intrusive but
+requires Javascript and might run slowly on older browsers. A text captcha
+might be used to test domain expertise of the people who are registering for
+an account. Finally, the graphic captcha is probably the one people are most
+familiar with.
+</p>
+<p>The Captcha and Recovery Questions section of the Security activity
+provides links to edit the Text Captcha and Recovery Questions for the
+current locale (you can change the current locale in Settings).
+In both cases, there are a fixed list of tests you can localize.
+A single test consists of a more question, a less question, and a comma separate
+list of possibilities. For example,
+</p>
+<pre>
+Which lives or lasts the longest?
+Which lives or lasts the shortest?
+lightning,bacteria,ant,dog,horse,person,oak tree,planet,star,galaxy
+</pre>
+<p>When challenging a user, Yioop picks a subset of tests. For each test,
+it randomly chooses between more or less question. It then picks
+a subset of the ordered list of choices, randomly permutes them, and presents
+them to the user in a dropdown.</p>
     <p><a href="#toc">Return to table of contents</a>.</p>

     <h3 id='upgrade'>Upgrading Yioop</h3>
 <p>If you have an older version of Yioop that you would like to upgrade,
  make sure to back up your data. Then download the latest
 version of Yioop and unzip it to the location you would like. Set the
-Search Engine Work Directory by the same method and to the same value as
+Search Engine Work Directory by the same method and to the same value as
 your old Yioop Installation. See the Installation section above for
 instructions on this, if you have forgotten how you did this.
-Knowing the old Work Directory location, should
-allow Yioop to complete the upgrade process.</p>
+Knowing the old Work Directory location, should allow Yioop to complete or
+instruct you how to complete the upgrade process.</p>
     <p><a href="#toc">Return to table of contents</a>.</p>
     <h3 id='files'>Summary of Files and Folders</h3>
     <p>The Yioop search engine consists of three main
@@ -926,9 +1011,9 @@ the Yioop folder's various sub-folders contain:
 <dl>
 <dt>bin</dt><dd>This folder is intended to hold command-line scripts
 and daemons which are used in conjunction with Yioop.
-In addition to the fetcher.php and queue_server.php script already mentioned,
+In addition to the fetcher.php and queue_server.php script already mentioned,
 it contains arc_tool.php, classifier_tool.php, classifier_trainer.php,
-code_tool.php, mirror.php, news_updater.php and query_tool.php. arc_tool.php
+code_tool.php, mirror.php, news_updater.php and query_tool.php. arc_tool.php
 can be used to examine the contents of WebArchiveBundle's and
 IndexArchiveBundle's from the command line. classifier_tool.php
 is a command line tool for creating a classifier it can be used to perform
@@ -961,8 +1046,8 @@ until you decide to change these. The file token_tool.php is a tool which can
 be used to help in term extraction during crawls and for making trie's
 which can be used for word suggestions for a locale. To help word extraction
 this tool can generate in a locale folder (see below) a word bloom filter.
-This filter can be used to segment strings into words for languages such as
-Chinese that don't use spaces to separate words in sentences.
+This filter can be used to segment strings into words for languages such as
+Chinese that don't use spaces to separate words in sentences.
 For trie and segmenter filter construction, this tool can use a file that lists
 words one on a line.
 </dd>
@@ -972,12 +1057,16 @@ coming into Yioop go through the top level index.php file. The query
 string (the component of the url after the ?) then says who is responsible
 for handling the request. In this query string there is a part which reads
 c= ... This says which controller should be used. The controller uses
-the rest of the query string such as the arg= variable to determine
+the rest of the query string such as the a= variable for activity function
+to call and the arg= variable to determine
 which data must be retrieved from which models, and finally which view
-with what elements on it should be displayed back to the user.</dd>
+with what elements on it should be displayed back to the user. Within
+the controller folder is a sub-folder components, a component is a collection
+of activities which may be added to a controller so that it can handle a
+request.</dd>
 <dt>css</dt><dd>This folder contains the stylesheets used to control
 how web page tags should look for the Yioop site when rendered in a
-browser</dd>
+browser.</dd>
 <dt>data</dt><dd>This folder contains a default sqlite database for a new Yioop
 installation. Whenever the WORK_DIRECTORY is changed it is this database
 which is initially copied into the WORK_DIRECTORY to serve as the database
@@ -1026,9 +1115,11 @@ between roman alphabet and the character system of the locale in question;
 <i>suggest-trie.txt.gz</i>, a <a href="http://en.wikipedia.org/wiki/Trie"
 >Trie data structure</a> used for search bar word suggestions;
 and <i>tokenizer.php</i>, which can specify the number of characters for
-this language to constitute a char gram or might contain segmenter to split
-strings into words for this language or a stemmer class used to stem terms for
-this language.
+this language to constitute a char gram, might contain segmenter to split
+strings into words for this language, a stemmer class used to stem terms for
+this language, a stopword remover for the centroid
+summarizer, a part of speech tagger, or thesaurus lookup procedure for the
+locale.
 </dd>
 <dt>models</dt><dd>This folder contains the subclasses of Model used by
 Yioop Models are used to encapsulate access to secondary storage.
@@ -1038,15 +1129,20 @@ than one table or across serveral files. The models folder has
 within it a datasources folder. A datasource is an abstraction layer
 for the particular filesystem and database system that is being used
 by a Yioop installation. At present, datasources have been defined
-for PDO (PHP's generic DBMS interface), sqlite, sqlite3, and mysql databases.
+for PDO (PHP's generic DBMS interface), sqlite3, and mysql databases.
 </dd>
 <dt>resources</dt><dd>Used to store binary resources such as graphics, video,
 or audio. For now, just stores the Yioop logo.</dd>
 <dt>scripts</dt><dd>This folder contains the Javascript files used by Yioop.
 </dd>
-<dt>tests</dt><dd>This folder contains UnitTest's for various lib
-components. Yioop comes with its own minimal UnitTest class which is
-defined in the lib/unit_test.php.</dd>
+<dt>tests</dt><dd>This folder contains UnitTest's and JavascriptUnitTests
+for various lib and script components. Yioop comes with its own minimal
+UnitTest and JavascriptUnitTest classes which defined in the lib/unit_test.php
+and lib/javascript_unit_test.php. It also contains a few files used for
+experiments. For example, string_cat_experiment.php was used
+to test which was the faster way to do string concatenation in PHP.
+many_user_experiment.php can be used to create a test Yioop installation
+with many users, roles, and groups.</dd>
 <dt>views</dt><dd>This folder contains View subclasses as well
 as folders for elements, helpers, and layouts. A View is
 responsible for taking data given to it by a controller and formatting it
@@ -1103,7 +1199,10 @@ QueueBundleUNIX_TIMESTAMP folders.</dd>
 <dt>data</dt><dd>If an sqlite or sqlite3 (rather than say MySQL) database is
 being used then a seek_quarry.db file is stored in the data folder. In Yioop,
 the database is used to manage users, roles, locales, and crawls. Data for
-crawls themselves are NOT stored in the database.</dd>
+crawls themselves are NOT stored in the database. Suggest a url data
+is stored in data in the file suggest_url.txt, certain cron information
+about machines is saved in cron_time.txt, and plugin configuration information
+can also be stored in this folder.</dd>
 <dt>locale</dt><dd>This is generally a copy of the locale folder mentioned
 earlier. In fact, it is the version that Yioop will try to use first.
 It contains any customizations that have been done to locale for this instance
@@ -1485,6 +1584,11 @@ query to Yioop such as:
 <pre>
 http://my-yioop-instance-host/?f=json&amp;q=query+terms
 </pre>
+<h3 id='tools'>Search Tools Page</h3>
+<img src='resources/SearchTools.png' alt='Search Tools Page'/>
+<img src='resources/SuggestAUrl.png' alt='Suggest A Url Form'/>
+    <p><a href="#toc">Return to table of contents</a>.</p>
+
     <p><a href="#toc">Return to table of contents</a>.</p>
 <h3 id='settings'>Settings</h3>
 <p>In
@@ -1493,8 +1597,10 @@ the corner of the page with the main search form is a Settings-Signin element:
 <img src='resources/SettingsSignin.png' alt='Settings Sign-in Element'/>
 <p>
 This element provides access for a user to change their search settings
-by clicking Settings. The Sign In link provides access to the Admin panel for
-the website.
+by clicking Settings. The Sign In link provides access to the Admin and User
+Accounts panels for the website. Clicking the Sign In link also takes one
+to a page where one can register for an account if Yioop is set up to allow
+user registration.
 </p>
 <img src='resources/Settings.png' alt='The Settings Form'/>
 <p>On the Settings page, there are currently three items which can be adjusted:
@@ -1514,9 +1620,6 @@ tell Yioop to use the French from France for outputting
 text. You can also add &its= the Unix
 timestamp of the search index you want.
 </p>
-<h3 id='tools'>Search Tools Page</h3>
-<img src='resources/SearchTools.png' alt='Search Tools Page'/>
-<img src='resources/SuggestAUrl.png' alt='Suggest A Url Form'/>
     <p><a href="#toc">Return to table of contents</a>.</p>

     <h3 id='mobile'>Mobile Interface</h3>
@@ -1541,6 +1644,9 @@ timestamp of the search index you want.
     except for the above minor changes, these instructions will also apply to
     the mobile setting.
     </p>
+    <p><a href="#toc">Return to table of contents</a>.</p>
+
+
     <h2 id='social'>User Accounts and Social Features</h2>
     <h3 id='registration'>Registration and Signin</h3>
 <p>Clicking on the Sign In link on the corner of the Yioop web site will
ViewGit