Revising documentation.thtml and coding.thtml for Version 0.92, a=chris

Chris Pollett [2013-01-04 07:Jan:th]

Revising documentation.thtml and coding.thtml for Version 0.92, a=chris

Filename
en-US/pages/documentation.thtml

diff --git a/en-US/pages/documentation.thtml b/en-US/pages/documentation.thtml
index e334544..f7ea6de 100755
--- a/en-US/pages/documentation.thtml
+++ b/en-US/pages/documentation.thtml
@@ -330,8 +330,8 @@
     large scale useful data sets that can be easily licensed. Raw data dumps
     do not contain indexes of the data though. This makes sense because indexing
     technology is constantly improving and it is always possible to re-index
-    old data. Yioop supports importing and indexing data from ARC, database
-    queries, log files MediaWiki XML dumps, and Open Directory RDF. It also
+    old data. Yioop supports importing and indexing data from ARC,
+    MediaWiki XML dumps, and Open Directory RDF. It also
     supports re-indexing of old Yioop data files created after version 0.66,
     and indexing crawl mixes. This means using Yioop
     you can have searchable access to many data sets as well as have the
@@ -444,7 +444,7 @@
     Yioop data via a function api.</li>
     <li>Yioop has been optimized to work well with smart phone web browsers
     and with tablet devices.</li>
-    <li>Yioop has built-in support for image and video specific search</li>
+    <li>Yioop has built-in support for image and video specific search.</li>
     </ul>
     <p><a href="#toc">Return to table of contents</a>.</p>

@@ -619,7 +619,7 @@ database, queue server, and robot settings. It will look
 something like:
 </p>
 <img src='resources/ConfigureScreenForm2.png' alt='The configure form'/>
-<p>The <b>Debug Display</b> field set has three check boxes: Error Info, Query
+<p>The <b>Debug Display</b> fieldset has three check boxes: Error Info, Query
 Info, and Test Info. Checking Error Info will mean that when the Yioop
 web app runs, any PHP Errors, Warnings, or Notices will be displayed
 on web pages. This is useful if you need to do debugging, but should not
@@ -631,7 +631,7 @@ systems library classes if the browser is navigated to
 http://YIOOP_INSTALLATION/tests/. None of these debug settings should
 be checked in a production environment.
 </p>
-<p>The <b>Search Access</b> field set has three check boxes:
+<p>The <b>Search Access</b> fieldset has three check boxes:
 Web, RSS, and API. These control whether a user can use the
 web interface to get query results, whether RSS responses to queries
 are permitted, or whether or not the function based search API is
@@ -667,7 +667,7 @@ changing database information you might have to sign in again.
 <p>The <b>Search Page Elements and Links</b> fieldset is used to tell
 you which element and links you would like to have presented on the search
 landing and search results pages. The Word Suggest check box controls whether
-a drop down of word suggestions should be presented by Yioop when a user
+a dropdown of word suggestions should be presented by Yioop when a user
 starts typing in the Search box. The Subsearch checkbox controls whether the
 links for Image, Video, and News search appear in the top bar of Yioop
 You can actually configure what these links are in the
@@ -781,11 +781,12 @@ the Yioop folder's various sub-folders contain:
 <dt>bin</dt><dd>This folder is intended to hold command-line scripts
 which are used in conjunction with Yioop. In addition to the fetcher.php
 and queue_server.php script already mentioned, it contains arc_tool.php,
-mirror.php, and query_tool.php. arc_tool.php can be used to examine the contents
-of WebArchiveBundle's and IndexArchiveBundle's from the command line.
-mirror.php can be used if you would like to create a mirror/copy of a Yioop
-installation.  Finally, query_tool.php can be used to run queries
-from the command-line.</dd>
+code_tool.php, mirror.php, and query_tool.php. arc_tool.php can be used to
+examine the contents of WebArchiveBundle's and IndexArchiveBundle's from the
+command line. code_tool.php is for use by developers to maintain the Yioop
+code-base in various ways. mirror.php can be used if you would like to create
+a mirror/copy of a Yioop installation. Finally, query_tool.php can be used to
+run queries from the command-line.</dd>
 <dt>configs</dt><dd>This folder contains configuration files. You will
 probably not need to edit any of these files directly as you can set the most
 common configuration settings from with the admin panel of Yioop. The file
@@ -850,7 +851,7 @@ English porter stemmer is present in this folder.</dd>
 with the Yioop system. A locale encapsulates data associated with a
 language and region. A locale is specified by an
 <a href='http://en.wikipedia.org/wiki/IANA_language_tag'>IETF language tag</a>.
-So for instance, within the locale folder there is a folder en-US for the
+So, for instance, within the locale folder there is a folder en-US for the
 locale consisting of English in the United States. Within a given locale tag
 folder there is a file configure.ini which contains translations of
  string ids to string in the language of the locale. This approach is
@@ -865,15 +866,19 @@ configure.ini, there is a statistics.txt file which has info about what
 percentage of the id's have been translated. In addition to configure.ini and
 statistics.txt, the locale folder for a language contains two sub-folders:
 pages, containing static html (with extension .thtml) files which might need
-to be translated, and resources. The resources folder contains files:
-suggest-trie.txt.gz, a <a href="http://en.wikipedia.org/wiki/Trie"
->Trie data structure</a> used for search bar word suggestions and tokenizer.php
-which either specifies the number of characters for this language to
-constitute a char gram or contains a stemmer class used to stem terms for
-this language. This folder might also contain a Bloom filter file with a name
-like all_word_grams.ftr which would be used to do word gramming of sequences of
-words that should be treated as a unit, for example, "Honda Accord" or
-"Bill Clinton".
+to be translated, and resources. The resources folder contains the files:
+<i>locale.js</i>, which contains locale specify Javascript code such as the
+variable alpha which is used to list out the letters in the alphabet for the
+language in question for spell check purposes, and roman_array for mapping
+between roman alphabet and the character system of the locale in question;
+<i>suggest-trie.txt.gz</i>, a <a href="http://en.wikipedia.org/wiki/Trie"
+>Trie data structure</a> used for search bar word suggestions;
+and <i>tokenizer.php</i>, which either specifies the number of characters for
+this language to constitute a char gram or contains a stemmer class used to stem
+terms for this language. This folder might also contain a Bloom filter file
+with a name like all_word_grams.ftr which would be used to do word gramming of
+sequences of words that should be treated as a unit, for example, "Honda Accord"
+or "Bill Clinton".
 </dd>
 <dt>models</dt><dd>This folder contains the subclasses of Model used by
 Yioop Models are used to encapsulate access to secondary storage.
@@ -886,7 +891,7 @@ by a Yioop installation. At present, datasources have been defined
 for sqlite, sqlite3, and mysql databases.</dd>
 <dt>resources</dt><dd>Used to store binary resources such as graphics, video,
 or audio. For now, just stores the Yioop logo.</dd>
-<dt>scripts</dt><dd>This folder contains the Javascript files used by Yioop
+<dt>scripts</dt><dd>This folder contains the Javascript files used by Yioop.
 </dd>
 <dt>tests</dt><dd>This folder contains UnitTest's for various lib
 components. Yioop comes with its own minimal UnitTest class which is
@@ -1043,19 +1048,34 @@ crawled from that IP address.</p>
 <img src='resources/Cache.png' alt='Example Cache Results'
 width="70%"/>
 <p>As the above illustrates, on a cache link click,
-Yioop will list the time of download and highlight
-the query terms. It should be noted that cached copies of web pages are
-stored on the fetcher which originally downloaded the page. The IndexArchive
-associated with a crawl is stored on the queue server and can be moved
-around to any location by simply moving the folder. However, if an archive
-is moved off the network on which fetcher lives, then the look up of a
-cached page might fail. On the cached page there is a "Toggle
-extracted summary" link. Clicking this will show the title, summary, and
-links that were extracted from the full page and indexed. No other terms
+Yioop will display a cached version of the page.
+The cached version has a link to the original version and download time
+at the top. Next there is a link to display all caches of this page that
+Yioop has in any index. This is followed by a link for extracted summaries,
+then in the body of the cached document the query terms are highlighted.
+Links within the body of a cache document first target a cached version
+of the page that is linked to which is as near into the future of the
+current cached page as possible. If Yioop doesn't have a cache for a link
+target then it goes to location pointed to by that target.
+Clicking on the history toggle, produces the following interface:
+</p>
+<img src='resources/CacheHistory.png' alt='Example Cache History UI'
+width="70%"/>
+<p>
+This let's you select different caches of the page in question.
+</p>
+<p> Clicking the "Toggle extracted summary" link  will show the title, summary,
+and links that were extracted from the full page and indexed. No other terms
 on the page are used to locate the page via a search query. This
 can be viewed as an "SEO" view of the page.</p>
 <img src='resources/CacheSEO.png' alt='Example Cache SEO Results'
 width="70%"/>
+<p>It should be noted that cached copies of web pages are
+stored on the fetcher which originally downloaded the page. The IndexArchive
+associated with a crawl is stored on the queue server and can be moved
+around to any location by simply moving the folder. However, if an archive
+is moved off the network on which fetcher lives, then the look up of a
+cached page might fail.</p>
 <p>In addition, to a straightforward web search, one can also do image,
 video, news searches by clicking on the Images, Video, or News links in
 the top bar of Yioop search pages. Below are some examples of what these look
@@ -1223,7 +1243,7 @@ use of memcache or file cache.</li>
 which had some_number of outgoing links. For example, numlinks:5.</li>
 <li><b>os:operating_system</b>  returns summaries of all documents
 served on servers using the given operating system. For example,
-<i>os:centos</i>, make sure to use lower case.</li>
+<i>os:centos</i>, make sure to use lowercase.</li>
 <li><b>path:path_component_of_url</b> returns summaries of all documents
 whose path component begins with path_component_of_url. For example,
 path:/phpBB would return all documents whose path started with phpBB,
@@ -1244,8 +1264,13 @@ the summaries of pages found at that url, host, or domain. As an example,
 <em>site:http://prints.ucanbuyart.com/</em>,
 <em>site:prints.ucanbuyart.com</em>, <em>site:.ucanbuyart.com</em>,
 <em>site:ucanbuyart.com</em>, <em>site:com</em>, will all returns with
-decreasing specificity. To return all pages listed in a Yioop index you can
-do <i>site:all</i>.
+decreasing specificity. To return all pages and links to
+pages in the Yioop index, you can do <i>site:any</i>. To return all pages
+(as opposed to pages and links to pages) listed in a Yioop index you can
+do <i>site:all</i>. site:all doesn't return any links, so you can't group links
+to urls and pages of that url together. If you want all sites where
+one has a page in the index as well as links to that site, than you can do
+<i>site:doc</i>.
 </li>
 <li><b>size:num_bytes</b> returns summaries of all documents whose download
 size was between num_bytes and num_bytes + 5000. num_bytes must be a multiple
@@ -1262,9 +1287,12 @@ multiplying all score for this portion of a query by some_number. For example,
 would  multiply scores satisfying Chris Pollett  and on wikipedia.org by
 5 and union these with those satisfying Chris Pollett
 </li>
-
 </ul>
-<p>In addition, to using the search form interface to query Yioop it is also
+<p>Although we didn't say it next to each query form above, if it makes sense,
+there is usually an <i>all</i> variant to a form. For example, os:all returns
+all documents from servers for which os information appeared in the headers.</p>
+<p>
+In addition to using the search form interface to query Yioop, it is also
 possible to query Yioop and get results in Open Search RSS format. To
 do that you can either directly type a URL into your browser of the form:</p>
 <pre>
@@ -1331,7 +1359,7 @@ Yioop from a mobile device.
     style="width:280px;height:280px"/>
     <p>For Admin pages, each activity is controlled in an analgous fashion
     to the non-mobile setting, but the Activity element has been replaced
-    with a drop-down:</p>
+    with a dropdown:</p>
 <img src='resources/MobileAdmin.png' alt='Example Mobile Admin Activity'
     style="width:280px;height:280px"/>
     <p>We now resume our discussion of how to use each of the Yioop admin
@@ -1669,9 +1697,9 @@ http://www.facebook.com/###!Facebook###!A%20famous%20social%20media%20site
     From the initial crawl options screen clicking on the Archive Crawl
     tab gives one the following form:</p>
 <img src='resources/ArchiveCrawlOptions.png' alt='Archive Crawl Options Form'/>
-    <p>The drop down lists all previously done crawls that are available for
+    <p>The dropdown lists all previously done crawls that are available for
     recrawl.</p>
-<img src='resources/ArchiveCrawlDropDown.png' alt='Archive Crawl Drop Down'/>
+<img src='resources/ArchiveCrawlDropDown.png' alt='Archive Crawl dropdown'/>
     </p>These include both previously done Yioop crawls, previously
     down recrawls (prefixed with RECRAWL::), Yioop Crawl Mixes (prefixed with
     MIX::), and crawls
@@ -1706,14 +1734,14 @@ http://www.facebook.com/###!Facebook###!A%20famous%20social%20media%20site
     PROFILE_DIR/cache/archives/my_wiki_media_files and put in it a
     file arc_description.ini in the format to be discussed in a moment.
     The arc_description.ini file's contents are used to give a description
-    for the archive crawl that will be displayed in the archive drop-down
+    for the archive crawl that will be displayed in the archive dropdown
     as well as specify the kind of archives the folder contains. An
     example arc_description.ini might look like:</p>
     <pre>
 arc_type = 'MediaWikiArchiveBundle';
 description = 'English Wikipedia 2012';
     </pre>
-    <p>In the Archive Crawl drop-down the description will appear with the
+    <p>In the Archive Crawl dropdown the description will appear with the
     prefix ARCFILE:: and you can then select it as the source to crawl.
     Currently, there are three supported arc_types. For folders containing
     file in Internet Archive arc format one can use:</p>
@@ -1776,7 +1804,7 @@ OdpRdfArchiveBundle
     the second result, the last group is used to display any remaining search
     results.</p>
     <p>The UI for groups works as follows: The top row has three columns.
-    To add new components to a group use the drop-down in the first column.
+    To add new components to a group use the dropdown in the first column.
     The second column controls for how many results
     the particular crawl group should be used. Different groups results are
     presented in the order they appear in the crawl mix. The last group is
@@ -1794,7 +1822,7 @@ OdpRdfArchiveBundle
     a crawl mix behave in a conditional many by using the "if:" meta word
     described in the search and user interface section. The last link in a
     crawl row allows you to delete a crawl from a crawl group. For changes on
-    this page to take effect, the "Save" button beneath this drop-down must
+    this page to take effect, the "Save" button beneath this dropdown must
     be clicked.
     </p>
     <p><a href="#toc">Return to table of contents</a>.</p>
@@ -1802,13 +1830,13 @@ OdpRdfArchiveBundle
     <p>Several properties about how web pages are indexed can be controlled
     by clicking on Page Options. This leads to a form which looks like:</p>
 <img src='resources/PageOptions.png' alt='The Page Options form'/>
-    <p>The Byte Range to Download drop-down controls how many bytes out of
+    <p>The Byte Range to Download dropdown controls how many bytes out of
     any given web page should be downloaded. Smaller numbers reduce the
     requirements on disk space needed for a crawl; bigger numbers would
-    tend to improve the search results. The next drop-down,
+    tend to improve the search results. The next dropdown,
     Allow Page Recrawl After, controls how many days that Yioop keeps
     track of all the URLs that it has downloaded from. For instance, if one
-    sets this drop-down to 7, then after seven days Yioop will clear its
+    sets this dropdown to 7, then after seven days Yioop will clear its
     Bloom Filter files used to store which urls have been downloaded, and it
     would be allowed to recrawl these urls again if they happened in links. It
     should be noted that all of the information from before the seven
@@ -1834,7 +1862,7 @@ OdpRdfArchiveBundle
     It has three main forms: An edited urls forms, a url editing form,
     and a filter websites form.</p>
     <p>If one has already edited the summary for
-    a url, then the drop-down in the edited urls form will list this url. One
+    a url, then the dropdown in the edited urls form will list this url. One
     can select it and click load to get it to display in the url editing
     form. The purpose of the url editing form is to allow a user to change
     the title and description for a url that appears on a search results
@@ -1880,7 +1908,7 @@ OdpRdfArchiveBundle
     the Media Kind can be either Video or RSS. Video Media sources
     are used to help Yioop recognize links which are of videos on
     a web video site such as YouTube. This helps in both tagging
-    such pages with the meta word media:video in a Yioop index, and
+    such pages with the meta word media:video in a Yioop index and
     in being able to render a thumbnail of the video in the search results.
     When the media kind is set to video, this form has three fields:
     Name, which should be a short familiar name for the video site (for example,
@@ -1963,15 +1991,15 @@ OdpRdfArchiveBundle
 <img src='resources/ManageMachines.png' alt='The Manage Machines form'/>
     <p>The Add machine form at the top of the page allows one to add a new
     machine to be controlled by this Yioop instance. The Machine
-    Name field let's you give this machine an easy to remember name
+    Name field lets you give this machine an easy to remember name.
     The Machine URL field should be filled in with the URL to the
-    installed Yioop instance. The is Mirror checkbox says whether you want
+    installed Yioop instance. The Mirror checkbox says whether you want
     the given Yioop installation to act as a mirror for another Yioop
-    installation. Checking it will reveal a drop-down menu that allows you
+    installation. Checking it will reveal a dropdown menu that allows you
     to choose which installation amongst the previously entered machines
     you want to mirror. The Has Queue Server checkbox is used to say whether
     the given Yioop installation will be running a queue server or not.
-    Finally, the  Number of Fetchers drop down allows you to say how many
+    Finally, the  Number of Fetchers dropdown allows you to say how many
     fetcher instances you want to be able to manage for that machine.
     The Delete Machine form allows you to remove a machine that you either
     misconfigured  or that you no longer want to manage through this Yioop
@@ -2033,7 +2061,7 @@ OdpRdfArchiveBundle
     and a localizer can use this interface to see what is written in these
     files. Yioop automatically creates these files in the directory the
     localizer is localizing for, and the localizer can translate their contents
-    into the appropriate language. Beneath this drop-down, the
+    into the appropriate language. Beneath this dropdown, the
     Edit Locale page mainly consists of a two column table: the right column
     being string ids, the left column containing what should be their
     translation into the given locale. If no translation exists yet,
@@ -2107,23 +2135,29 @@ OdpRdfArchiveBundle
     the length of string to use in doing char-gramming. If you add a
     language to Yioop and want to use char gramming merely add a tokenizer.php
     to the corresponding locale folder with such a line in it.</p>
-    <h3>Using token_tool.php to improve search performance and relevance
-    for your language</h3>
+    <h3 id="token_tool">Using token_tool.php to improve search performance and
+    relevance for your language</h3>
     <p>configs/token_tool is used to create suggest word dictionaries and 'n'
     word gram filter files for the Yioop search engine. To create either of
     these items, the user puts a source file in Yioop's WORK_DIRECTORY/prepare
     folder. Suggest word dictionaries are used to supply the content of the
     dropdown of search terms that appears as a user is entering a query in
-    Yioop. To make a suggest dictionary one can use a command like:</p>
+    Yioop. They are also used to do spell correction suggestions after a
+    search has been performed. To make a suggest dictionary one can use a
+    command like:</p>
     <pre>
     php token_tool.php dictionary filename locale endmarker
     </pre>
     <p>
-    Here filename should be in the current folder or PREP_DIR and should consist
-    of one word per line, locale is the locale this suggest (for example, en-US)
-    file is being made for and where a file suggest-trie.txt.gz will be written,
+    Here <i>filename</i> should be in the current folder or PREP_DIR, locale is
+    the  locale this suggest (for example, en-US)
+    file is being made for and where a file suggest_trie.txt.gz will be written,
     and endmarker is the end of word symbol to use in the trie. For example,
-    $ works pretty well.
+    $ works pretty well. The format of <i>filename</i> should be a sequence of
+    line, each line containing a word or phrase followed by a space followed by
+    a frequency count. i.e., the last thing on the line should be a number.
+    Given a corpus of documents a frequency for a word would be the number of
+    occurences of that word in the document.
     </p>
     <p>
     token_tool.php can also be used to make filter files. A filter file is used
@@ -2153,16 +2187,17 @@ OdpRdfArchiveBundle

     <h3>Obtaining data sets for token_tool.php</h3>
     <p>
-    Many word lists are obtainable on the web for free with Creative Commons
-    licenses. A good starting point is:</p>
+    Many word lists with frequencies are obtainable on the web for free
+    with Creative Commons licenses. A good starting point is:</p>
     <pre>
     <a href="http://en.wiktionary.org/wiki/Wiktionary:Frequency_lists"
     >http://en.wiktionary.org/wiki/Wiktionary:Frequency_lists</a>
     </pre>
-    <p>A little script-fu can generally take such a list and put it into the
-    format of one word/term per line which is needed by token_tool.php</p>
+    <p>A little script-fu can generally take such a list and output it with the
+    line format of "word/phrase space frequency"  needed by
+    token_tool.php</p>
     <p>
-    For filter files, Raw page count dumps can be found at:</p>
+    For filter files, raw page count dumps can be found at:</p>
     <pre>
     <a href="http://dumps.wikimedia.org/other/pagecounts-raw/"
     >http://dumps.wikimedia.org/other/pagecounts-raw/</a>
@@ -2180,9 +2215,9 @@ OdpRdfArchiveBundle
     This page lists all the dumps according to date they were taken. Choose any
     suitable date or the latest. A link with a label such as 20120104/,
     represents a dump taken on  01/04/2012. Click this link to go in turn to a
-    page which has many links based on type of content you are looking for. For
-    this tool you are interested in files under "Recombine all pages, current
-    versions only".</p>
+    page which has many links based on the type of content you are looking for.
+    For this tool you are interested in files under "Recombine all pages,
+    current versions only".</p>
     <p>
     Beneath this we might find a link with a name like:</p>
     <pre>
@@ -2191,6 +2226,36 @@ OdpRdfArchiveBundle
     <p>
     which is a file that could be processed by this tool.
     </p>
+    <h3>Spell correction and romanized input with locale.js</h3>
+    <p>Yioop supports the ability to suggest alternative queries
+    after a search is performed. These queries are mainly restricted to
+    fixing typos in the original query. In order to calculate
+    these spelling corrections, Yioop takes the query and for each query term
+    computes each possible single character change to that term. For each
+    of these it looks up in the given locale's suggest_trie.txt.gz
+    a frequency count of that variant, if it exists. If the best suggestion
+    is some multiple better than the frequency count of the original query
+    then Yioop suggests this alternative query. In order for this to
+    work, Yioop needs to know what constitutes a single character in the
+    original query. The file locale.js in the
+     WORK_DIRECTORY/locale/LOCALE_TAG/resources folder can be used
+    to specify this for the locale given by LOCALE_TAG. To do this,
+    all you need to do is specify a Javascript variable alpha. For example,
+    for French (fr-FR) this looks like:</p>
+    <pre>
+var alpha = "aåàbcçdeéêfghiîïjklmnoôpqrstuûvwxyz";
+    </pre>
+    <p>The letters do not have to be in any alphabetical order, but should be
+    comprehensive of the non-punctuation symbols of the language in question.
+    </p>
+    <p>Another thing locale.js can be used for is to given mappings
+    between roman letters and other scripts for use in the Yioop's autosuggest
+    dropdown that appears as you type a query. As you type,
+    scripts/suggest.js function onTypeTerm is called. This in turn
+    will cause a particular locale's locale.js function transliterate(query)
+    if it exists. This function should return a string with the result
+    of the transliteration. An example of doing this is given for the
+    Telugu locale in Yioop.</p>
     <p><a href="#toc">Return to table of contents</a>.</p>
     <h2 id='framework'>Building a Site using Yioop as Framework</h2>
     <p>The Yioop code base can serve as the code base for new custom search
@@ -2209,7 +2274,7 @@ OdpRdfArchiveBundle
     <p>The index.php script is the first script run by the Yioop web app.
     It has an array $available_controllers which lists the controllers
     available to the script. The names of the controllers in this array are
-    lower case. Based on whether the $_REQUEST['c'] variable is in this array
+    lowercase. Based on whether the $_REQUEST['c'] variable is in this array
     index.php either  loads the file {$_REQUEST['c']}_controller.php or loads
     whatever the default controller is. index.php also checks for the existing
     of APP_DIR/index.php and loads it if it exists. This gives
@@ -2253,7 +2318,7 @@ OdpRdfArchiveBundle
     $mycontroller-&gt;mypluginnamePlugin
 </pre>
     <p>Notice in each expression the name of the
-    particular model or plugin is lower case. Given this way of referring
+    particular model or plugin is lowercase. Given this way of referring
     to models, a controller can invoke a models methods to get data out
     of the file system or from a database with expressions like:</p>
 <pre>
@@ -2322,7 +2387,7 @@ OdpRdfArchiveBundle
     of pages which may be common across Views. Helper's on the other hand
     are used typically to render UI elements. For example, OptionsHelper
     has a render($id, $name, $options, $selected) method and is used to
-    draw select drop-downs.
+    draw select dropdowns.
     </p>
     <p>When rendering a View or Element one often has css, scripts, images,
     videos, objects, etc. In BASE_DIR, the targets of these tags would typically
@@ -2416,8 +2481,8 @@ OdpRdfArchiveBundle
     <p>The same basic urls as above can return RSS results simply by appending
     to the end of the them &ampf=rss. This of course only makes sense for
     usual and related url queries -- cache queries return web-pages not
-    a list of search results. An example of a portion of an RSS result might
-    look like:</p>
+    a list of search results. Here is an example of what a portion of an RSS
+    result might look like:</p>
 <pre>
 &lt;?xml version="1.0" encoding="UTF-8" ?&gt;
 &lt;rss version="2.0" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/"
@@ -2481,18 +2546,18 @@ xmlns:atom="http://www.w3.org/2005/Atom"
     <p><a href="#toc">Return to table of contents</a>.</p>
     <h2 id='customizing'>Customizing Yioop</h2>
     <p>One advantage of an open-source project is that you have complete
-    access to the source code. Thus, you can modify Yioop to fit in
-    with your existing project or add new feel free to add new features to
-    Yioop. In this section, we look a little bit at some common ways you
+    access to the source code. Thus, Yioop can be modified to fit in
+    with your existing project. You can also freely add new features onto
+    Yioop. In this section, we look a little bit at some of the common ways you
     might try to modify Yioop as well as ways to examine the output of a
     crawl in a more technical manner. If you decide to modify the source code
-    it is recommended you look at the <a
+    it is recommended that you look at the <a
     href="#files">Summary of Files and Folders</a> above again, as well
     as look at the <a href="http://www.seekquarry.com/yioop-docs/">online
-    Yioop documentation</a>.</p>
+    Yioop code documentation</a>.</p>

     <h3>Handling new File Types</h3>
-    <p>One relatively easy enhancement to Yioop would be to enhance
+    <p>One relatively easy enhancement to Yioop is to enhance
     the way it processes an existing file type or to get it to process
     new file types. Yioop was written from scratch without dependencies
     on existing projects. So the PHP processors for Microsoft
@@ -2512,9 +2577,9 @@ xmlns:atom="http://www.w3.org/2005/Atom"
     </pre>
     <p>
     A good reference implementation of a TextProcessor subclass can be found in
-    html_processor.php. If you are trying to support a new file type, to get
-    Yioop to use your processor you need to edit the configs/config.php
-    file. In config.php you should add the extension of the file type
+    html_processor.php. If you are trying to support a new file type, then
+    to get Yioop to use your processor you need to edit the configs/config.php
+    file. In config.php, you should add the extension of the file type
     you are going to process to the array $INDEXED_FILE_TYPES. You will
     also need to add an entry to the $PAGE_PROCESSORS array of the
     form "new_mime_type_handle" =&gt; "NewProcessor" .
@@ -2608,7 +2673,23 @@ xmlns:atom="http://www.w3.org/2005/Atom"
     <p>This completes the discussion of how to write an indexing plugin.</p>
     <p><a href="#toc">Return to table of contents</a>.</p>
     <h2 id='commandline'>Yioop Command-line Tools</h2>
-    <h3>Configuring Yioop from the Command-line</h3>
+    <p>In addition to <a href="#token_tool">token_tool.php</a> which we
+    describe in the section on localization, Yioop comes with several useful
+    command-line tools and utilities. We next describe these in roughly
+    their order of likely utility:
+    </p>
+    <ul>
+    <li><a href="#configure_tool">bin/configure_tool.php: Used to
+    configure Yioop from the command-line</a></li>
+    <li><a href="#arc_tool">bin/arc_tool.php: Used to examine the contents of
+    WebArchiveBundle's and IndexArchiveBundles's</a></li>
+    <li><a href="#query_tool">bin/query_tool.php: Used to query an index from
+    the command-line</a></li>
+    <li><a href="#code_tool">bin/code _tool.php: Used to help code Yioop
+    and to help make clean patches for Yioop.</a>
+    </li>
+    </ul>
+    <h3 id="configure_tool">Configuring Yioop from the Command-line</h3>
     <p>In a multiple queue server and fetcher setting, one might have web access
     only to the name server machine -- all the other machines might be on
     virtual private servers to which one has only command-line access. Hence,
@@ -2648,30 +2729,30 @@ Please choose an option:
     </pre>
     <p>
     Except for the Change root password option, these correspond to the
-    different fieldsets on the Configure activity. The command-line forms let
-    one gets from selecting one of these choise let one set the same
+    different fieldsets on the Configure activity. The command-line forms
+    one gets from selecting one of these choices let one set the same
     values as were described earlier in the
     <a href="#installation">Installation</a> section. The change root password
-    option lets one set the account password for root. i.e., the main admin
-    user.On a non-nameserver machine, it is probably simpler to go with
+    option lets one set the account password for root. I.e., the main admin
+    user. On a non-nameserver machine, it is probably simpler to go with
     a sqlite database, rather than hit on a global mysql database from
     each machine. Such a barebones local database set-up would typically
     only have one user, root</p>
-    <p>Another thing to consider, when configuring a collection of Yioop
-    machines in such a setting, is that by default, under Search Access Set-up,
+    <p>Another thing to consider when configuring a collection of Yioop
+    machines in such a setting, is, by default, under Search Access Set-up,
     subsearch is unchecked. This means the RSS feeds won't be downloaded
     hourly on such machines. If one unchecks this, they will. This may or
     may not make sense to do -- it might be advantageous to distribute the
     downloading of RSS feeds across several machines -- any machine in
     a Yioop cluster can send media news results in response to a search query.
     </p>
-    <h3>Examining the contents of WebArchiveBundle's and
+    <h3 id="arc_tool">Examining the contents of WebArchiveBundle's and
     IndexArchiveBundles's</h3>
     <p>
-    The command-line script bin/arc_tool.php can be use to examine the
-    contents of a WebArchiveBundle or an IndexArchiveBundle. i.e., it gives
-    a print out of the web pages or summaries contained therein. It can also
-    be used to give information from the headers of these bundles. Finally,
+    The command-line script bin/arc_tool.php can be used to examine the
+    contents of a WebArchiveBundle or an IndexArchiveBundle. This tool gives
+    a print out of the web pages or summaries contained in such bundles. It can
+    also be used to give information from the headers of these bundles. Finally,
     it can be used to re-index an IndexArchiveBundle's dictionary based
     on the contents of the partial dictionaries in each of the bundles
     posting_doc_shards. arc_tool is run from the command-line with the syntaxes:
@@ -2779,8 +2860,8 @@ still has more than one tier (tiers are the result of incremental
 log-merges which are made during the crawling process). The
 mergetiers command merges these tiers into one large tier which is
 then usable by Yioop for query processing.<p>
-    <h3>Querying an Index from the command-line</h3>
-<p>The command-line script bin/query_tool.php can be use to query
+    <h3 id="query_tool">Querying an Index from the command-line</h3>
+<p>The command-line script bin/query_tool.php can be used to query
 indices in the Yioop WORK_DIRECTORY/cache. This tool can be used
 on an index regardless of whether or not Apache is running. It can be
 used for long running queries that might timeout when run within a browser
@@ -2833,6 +2914,47 @@ all of the Yioop meta words should work so you can do queries like
 "my_query i:timestamp_of_index_want". Query results depend on the
 kind of language stemmer/char-gramming being used, so French results might be
 better if one specifies fr-FR then if one relies on the default en-US.</p>
+<h3 id="code_tool"> A Tool for Coding and Making Patches for Yioop</h3>
+<p>bin/code_tool.php can perform several useful task to help developers
+program for the Yioop environment. Below is a brief summary of its
+functionality:</p>
+<dl>
+<dt>php code_tool.php clean path</dt>
+    <dd>Replaces all tabs with four spaces and trims all whitespace off ends of
+    lines in the folder or file path</dd>
+
+<dt>php code_tool.php copyright path</dt><dd>
+    Adjusts all lines in the files in the folder at path (or if
+    path is a file just that) of the form 2009 - \d\d\d\d to
+    the form 2009 - this_year where this_year is the current year.</dd>
+
+<dt>php code_tool.php longlines path</dt><dd>
+    Prints out all lines in files in the folder or file path which are
+    longer than 80 characters.</dd>
+
+<dt>php code_tool.php replace path pattern replace_string<br />
+&nbsp;&nbsp;&nbsp;&nbsp;or<br />
+php code_tool.php replace path pattern replace_string effect</dt><dd>
+    Prints all lines matching the regular expression pattern followed
+    by the result of replacing pattern with replace_string in the
+    folder or file path. Does not change files.</dd>
+
+<dt>php code_tool.php replace path pattern replace_string interactive</dt><dd>
+    Prints each line matching the regular expression pattern followed
+    by the result of replacing pattern with replace_string in the
+    folder or file path. Then it asks if you want to update the line.
+    Lines you choose for updating will be modified in the files.</dd>
+
+<dt>php code_tool.php replace path pattern replace_string change</dt><dd>
+    Each line matching the regular expression pattern is update
+    by replacing pattern with replace_string in the
+    folder or file path. This format doe not echo anything, it does a global
+    replace without interaction.</dd>
+
+<dt>php code_tool.php search path pattern</dt><dd>
+    Prints all lines matching the regular expression pattern in the
+    folder or file path.</dd>
+</dl>
     <h2 id="references">References</h2>
     <dl>
 <dt id="APC2003">[APC2003]</dt>

ViewGit