Coding Guidelines for the Yioop

Introduction

In order to understand a software project, it helps to understand its organization and conventions. To encourage people to dive in and help improve Yioop, and to ensure contributions which are easily understood within the context of Yioop's current standards, this article describes the coding conventions, issue tracking, and commit process for Yioop. It first describes the coding styles to be used for various languages within Yioop. It then describes some guidelines for what kind of code should go into which kind of files in Yioop. Finally, it concludes with a discussion of how issues should be submitted to the issue tracker, how to make patches for Yioop, and how commit messages should be written.
Return to table of contents .

General

  1. One of the design goals of Yioop was to minimize dependencies on other projects and libraries. When coming up with a solution to a problem preference should be given to solutions which do not introduce new dependencies on external projects or libraries. Also, one should be on the lookout for eliminating existing dependencies, configuration requirements, etc.
  2. The coding language for Yioop is English. This means all comments within the source code should be in English.
  3. All data that will be written to the web interface should be localizable. That means easily translatable to any text representation of a human language. The section on localization discusses facilities in Yioop for doing this.
  4. Information written as log messages to log files and profiling information about queries (made available by the query info checkbox in Configure), which are not intended for end-users, do not need to be localized.
  5. Project file names should be lowercase words. Non-PHP, multi-word file names should separate words with an underscore. For example, default_crawl.ini
  6. To facilitate autoloading, all PHP files names should be camel-cased starting with a upper-case letter. For example, AdminView.php
  7. Each project file should begin with the GPL3 license as a comment in the appropriate format for the file in question. For example, for a PHP file, this might look like:
     /**
      *  SeekQuarry/Yioop --
      *  Open Source Pure PHP Search Engine, Crawler, and Indexer
      *
      *  Copyright (C) 2009 - 2014  Chris Pollett chris@pollett.org
      *
      *  LICENSE:
      *
      *  This program is free software: you can redistribute it and/or modify
      *  it under the terms of the GNU General Public License as published by
      *  the Free Software Foundation, either version 3 of the License, or
      *  (at your option) any later version.
      *
      *  This program is distributed in the hope that it will be useful,
      *  but WITHOUT ANY WARRANTY; without even the implied warranty of
      *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
      *  GNU General Public License for more details.
      *
      *  You should have received a copy of the GNU General Public License
      *  along with this program.  If not, see .
      *
      *  END LICENSE
      *
      * @author Chris Pollett chris@pollett.org
      * @package seek_quarry
      * @subpackage executables
      * @license http://www.gnu.org/licenses/ GPL3
      * @link http://www.seekquarry.com/
      * @copyright 2009 - 2014
      * @filesource
      */        
    
    Here the subpackage might vary.
  8. All non-binary files in Yioop should be UTF-8 encoded. Files should not have a byte order mark.
  9. All non-binary files in Yioop should follow the convention of using four spaces for tabs (rather than tab characters). Further, all lines should be less than or equal to 80 columns in length. Lines should not end with trailing white-space characters. It is recommended to use an editor which can display white-space characters and which can display a bar marking the 80th column. For example, one can use gEdit or vim.
  10. One should use one space before and after assignment, boolean, binary, and comparison operators. A single space should be used after, but not before, commas and semi-colons. A space should not be used before increment, decrement, and sign operators:
     if ($i == 0 && $j > 5 * $x) { /* some statements*/}
     $i = 7;
     $i += 3;
     $a = [1, 2, 3, 4];
     for ($i = 0; $i < $num; $i++) {
     }
    
  11. Control keywords such as if, for, foreach, while, switch, etc. should be followed by one blank space before the open parenthesis that follows them:
      if ($my_var > 0) {
      }
      while ($still_working) {
      }
      switch ($selector)
      {
      }
    
  12. Some leeway may be given on this if it helps make a line under 80 characters -- provided being under 80 characters helps program clarity.
  13. Do not use unstable code layouts such as:
     $something1   = 25;
     //...
     $something10  = 25;
     //...
     $something100 = 27;
    
    Although the equal signs are aligned, the spacing is unstable under changes of variable names. Do not have multiple statements on one line such as:
     $a=1; $b=6; $c=7;
    
  14. Braces on class declarations, interface declarations, function declarations, and CSS declaration groups should be vertically aligned. For example,
     class MyClass
     {
         //code for class
     }
     
     interface MyInterface
     {
         //code for interface
     }
     
     function myFun()
     {
         //some code
     }
     
     .my-selector
     {
         //some css
     }
     
    
  15. Braces for conditionals, loops, etc. should roughly follow the one true brace convention (1TBS):
     if (cond) { /*single statement should still use braces*/}
     
     if (cond) {
         //some statements
     } else if {
         // another condition
     } else {
         // yet another condition
     }
    
     switch ($my_var) {
         case 1:
             break;
         case 2:
             //no break - should comment like this if don't have a break
         case 3:
             break;
         default:
     }
     
     while (something) {
         //do something
     }
     
     for ($i = 0; $i < $num; $i++) {
     }
    
  1. The body of conditionals, loops, etc. code blocks should be indented 4 spaces. Code should not appear on the same line as an opening brace or on the same line as a closing brace:
     class MyClass
     {   function MyFun //not allowed
         {
         }
     }
     
     if (something) {
         $i++;
         $j++; } // not allowed
     
     if (something) {
         $i++;
         $j++; 
     } // good
    
    An exception is allowed for single-line code blocks:
     if (something) { $i++; } // is allowed
     if (something) {
         $i++;
     } //is preferred
    
  2. When a non-compound statement is split across several lines, all lines after the first should be indented four spaces:
     //a long function call
     setlocale(LC_ALL, $locale_tag, $locale_tag.'.UTF-8',
         $locale_tag.'.UTF8',  $locale_tag.".TCVN", $locale_tag.".VISCII",
         $locale_tag_parts[0], $locale_tag_parts[0].'.UTF-8',
         $locale_tag_parts[0].'.UTF8', $locale_tag_parts[0].".TCVN");
     
     // a case where the conditional of an if is long
     if (!file_exists("$tag_prefix/statistics.txt") ||
         filemtime("$tag_prefix/statistics.txt") <
         filemtime("$tag_prefix/configure.ini")) {
         //code
     }
    
Return to table of contents .
     

PHP

Most of the code for Yioop is written in PHP. Here are some conventions that Yioop programmers should follow with regards to this language:
  1. Yioop code should adhere to the PHP Framework Interop Group's PSR-2 Coding Style Guidelines.
  2. Yioop code should all be in the seekquarry\yioop namespace or a subspace of this namespace. Global variables and other variables which affect namespace's outside seekquarry\yioop should not be used.
  3. Code should be in classes other than web app entry points and the files: configs/Config.php, src/executables/CodeTool.php, src/library/LocaleFunctions.php, src/library/UpgradeFunctions.php, and src/library/Utility.php. Code should rely on the autoloading mechanism to load all other files.
  4. The end of a PHP file should omit a final closing ?> and should have one blank line beyond the last text character line.
  5. Classes should be organized as:
     class MyClass
     {
         // Variable Declarations
         var some_var;
     
         // Constant Declarations
         const SOME_CONTANTS;
     
         // Constructor
         function __construct()
         {
             // code
         }
     
         // abstract member functions, if any
         /*
         abstract function someAbstractMethod($arg1, $arg2)
         {
             // code
         }
         */
     
         //non static member functions
         function someFunction($arg)
         {
             // code
         }
     
         // static member functions
         function someStaticFunction($arg)
         {
             // code
         }
     }
    
  6. Except for loop variables where $i, $j, $k may be used, preference should be given to variable names which are full words. $queue rather than $q, for example. Some common abbreviations are permissible $dir (for directory), $db (for database), $str (for string), but should be avoided.
  7. Variable names should be descriptive. If this entails multi-word variable names, then the words should be separated by underscores. For example, $crawl_order.
  8. Defines, class constants, global variables (used in more than one file) should be written in all-caps. All other variables should be lowercase only. Some example defines in Yioop are: BASE_DIR, NAME_SERVER, USER_AGENT_SHORT. Some example global variables are: $INDEXED_FILE_TYPES, $IMAGE_TYPES, $PAGE_PROCESSORS. Some example class constants in Yioop are: CrawlConstants::GOT_ROBOT_TXT, CrawlConstants::INVERTED_INDEX, IndexDictionary::DICT_BLOCK_SIZE.
  9. Function and member function names should be camel-cased beginning with a lowercase letter. For example, insert, crawlHash, getEntry, extractWordStringPageSummary.
  10. Class and interface names should be camel-cased beginning with an uppercase letter. For example, CrawlConstants, IndexShard, WebArchiveBundle. Class names involved in the web-app portion of Yioop: controllers, elements, helpers, layouts, models, and views should begin with an uppercase letter, subsequent words except this last should be lowercase. For example, SearchfiltersModel, MachinestatusView. This facilitates Yioop's auto-loading mechanism.
  11. Yioop code should not use language feature which would break backward compatibility with PHP 5.4.
  12. Each require/include, define, global variable, function, class, interface, field, constant, member function should have a phpDoc docblock. These comments look like /** some comment */.
  13. The GPL license should be included in a phpDoc (page-level) docblock which includes @author, @package, @subpackage, @license, @link http://www.seekquarry.com/, @copyright, and @filesource tags. See the example in the General guidelines section .
  14. Field variables (PHP properties) docblock's should use @var to say the type of the field. For example,
     /**
      * Number of days between resets of the page url filter
      * If nonpositive, then never reset filter
      * @var int
      */
     var $page_recrawl_frequency;
    
  15. Multi-line phpDoc's should have a a vertical line of *'s. For example,
     /**
      * First line of a phpDoc is a short summary, should not rehash function name
      *
      * Then a blank comment line, followed by
      * a longer description. This in turn is followed by an @tags
      *
      * @param type $var_name description of variable
      * @return type description of returned valued
      */
    
  16. Each parameter of a function/member function should be documented with an @param tag. The return value of a function/member function should be documented with an @return tag. For example,
     /**
      * Subtracts the two values $value1 and $value2
      *
      * This function is intended to be used as a callback function for sorting
      *
      * @param float $value1 a value to take the difference between
      * @param float $value2 the other value
      * @return float the difference
      */
     function difference($value1, $value2)
     {
         return $value1 - $value2;
     }
    
    Notice the type of the argument/return value is give after the @tag. This could be NULL, int, float, string, array, object, resource, or mixed -- mixed, is used for return values which might return more than one type.
  17. Multi-line comments within the body of a function or method should not use // such as:
     // first line
     // second line       
    
    C-style comments /* */ should be used instead.
  18. Multi-line comments within the body of a function or method should not have a vertical stripe of stars. This prevents fragile layout problems with comments. For example, a good multi-line comment within a function might look like:
     /*
         This loop's end condition
         will be satisfied by something clever.
      */
    
Return to table of contents .

Javascript

  1. Variable names should not begin with $'s to avoid confusion with PHP. Except for this, they should follow the same conventions as PHP variable names described earlier. Here are some example Javascript variable names: i,j,k, request, message_tag.
  2. Function names should be camel-cased beginning with a lowercase letter. For example, elt, redrawGroup, drawCrawlSelect.
  3. Function docblock comments have the same format as PHP ones, but rather than use /** */ use /* */. For example,
     /*
      *  Make an AJAX request for a url and put the results as inner HTML of a tag
      *
      *  @param Object tag  a DOM element to put the results of the AJAX request
      *  @param String url  web page to fetch using AJAX
      */
     function getPage(tag, url)
     {
     //code
     }
    
  4. Within functions, comments follow the same conventions as PHP.
  5. One should avoid echoing Javascript within PHP code and instead move such code as much as possible to an external .js file.
  6. Javascript should be included/inlined at the end of web pages not at the beginning. This allows browsers to begin rendering pages rather than blocking for pages to load.
  7. Javascript output via PHP in a controller should be output in the $data['SCRIPT'] field sent in the $data variable to a view.
  8. Localization needed by Javascript should be passed from PHP controllers using the $data['SCRIPT'] field sent in the $data variable to a view. For example, in PHP one might have:
     $data["MESSAGE"] = tl('admin_controller_configure_no_set_config');
     $data['SCRIPT'] .=
         "doMessage('<h1 class=\"red\" >".
         $data["MESSAGE"] . "</h1>');" .
         "setTimeout('window.location.href= ".
         "window.location.href', 3000);"; 
    
    The PHP function tl is used here to provide the translation, which will be used in the Javascript function call.
  9. Javascript output by a PHP View should be output as much as possible outside of PHP tags <?php ... ?> rather than with echo or similar statements.
  10. External Javascript files (.js files) should not contain any PHP code.
  11. External Javascript files should be included using the $data['INCLUDE_SCRIPTS'] array. For example,
     $data['INCLUDE_SCRIPTS'] = ["script1", "script2"];        
    
    would include script1.js and script2.js from the Yioop script folder.
Return to table of contents .

CSS

  1. CSS should W3C validate as either CSS 2 or CSS 3. CSS 3 styles should fail gracefully on non-supported browsers. Use of browser specific extensions such as -ms, -moz, -o, and -webkit selectors should only be for CSS 3 effects not yet supported by the given browser.
  2. A CSS Rule Set in Yioop should follow one of the following formats:
     /* single selector case */
     selector
     {
         property1: value1; /* notice there should be a single space after the : */
         property2: value2; /* all property-value pairs should be terminate with a
                               semi-colon */
         ...
     }
    
     /* multiple selector case */
     selector1,
     selector2,
     ...
     {
         property1: value1;
         property2: value2;
         ...
     }
    
  3. Selectors should be written on one line. For example:
     .html-rtl .user-nav ul li
    
    Notice a single space is used between parts of this.
  4. If an element should look different in a right-to-left language than a left-to-right language, then the .html-ltr and .html-rtl class selectors should be used. For example,
     .html-ltr .user-nav
     {
         margin:0 0.5in 0 0;
         min-width: 10in;
         padding:0;
         text-align: right;
     }
    
     .html-rtl .user-nav
     {
         margin:0 0 0 0.5in;
         min-width: 10in;
         padding:0;
         text-align: left;
     }
    
    For vertically written languages, one can use the selectors: .html-rl-tb, .html-lr-tb, .html-tb-rl, .html-tb-lr. Finally, if an element needs to be formatted differently for mobile devices, the .mobile selector should be used:
     .mobile .user-nav
     {
         font-size: 11pt;
         min-width: 0;
         left:0px;
         padding: 0px;
         position: absolute;
         right: 0px;
         top: -10px;
         width:320px;
     }
    
  5. To increase clarity, left-to-right, right-to-left, and mobile variants of the otherwise same selector should appear near each other in the given stylesheet file.
  6. Class and ID selectors should be lowercase. Multi-word selector names should have the words separated by a hyphen:
     .mobile
     #message
     #more-menu
     .user-nav
    
  7. Multiple selectors should be listed in alphabetical order. Properties in a rule-set should be listed alphabetically. For example,
     .html-ltr .role-table,
     .html-ltr .role-table td,
     .html-ltr .role-table th
     {
         border: 1px solid black;
         margin-left: 0.2in;
         padding: 1px;
     }     
    
    An exception to this is a browser-specific property should be grouped next to its CSS3 equivalent.
Return to table of contents .

HTML

  1. Any web page output by Yioop should validate as HTML5. This can be checked at the site http://validator.w3.org/.
  2. Any web page output by Yioop should pass the Web accessibility checks of the WAVE Tool.
  3. Web pages should render reasonably similarly in any version of Chrome, Firefox, Internet Explorer, Opera, or Safari released since 2009. To test this, it generally suffices to test a 2009 version of each of these browsers together with a current version.
  4. All tags in a document should be closed, but short forms of tags are allowed. i.e., a tag like <br`>` must have a corresponding close tag </br>; however, it is permissible to use the short open-close form <br` />`.
  5. All tag attribute should have their values in single or double quotes:
     <tag attribute1='value1' attribute2='value1' >
     not
     <tag attribute1=value1 attribute2=value1 >
    
  6. For those still using Internet Explorer 6... For any given tag, name attribute values should be different than their id attribute values. For multi-word name attribute values, separate words with underscore, for id attributes, separate them with hyphens. For example,
     <input id="some-form-field" name="some_form_field" type="text"  />
    
  7. HTML code is output in views, elements, helpers, and layouts in Yioop. This code might be seen in one of two contexts: Either by directly looking at the source code of Yioop (so one can see the PHP code, etc.) or in a browser or other client when one uses the client's "View Source" feature. Code should look reasonably visually appealing in either context, but with preference given to how it looks as source code. Client-side HTML is often a useful tool for debugging however, so should not be entirely neglected.
  8. Generating code dynamically all on one line should be avoided. Client-side HTML should avoid lines longer than 80 characters as well.
  9. Although not as strictly followed as for braces, an attempt should be made to align block-level elements. For such an element, one should often place the starting and ending tag on a line by itself and nest the contents by four spaces, if possible. This is not required if the indentation level would be too deep to easily read the line. Inline elements can be more free-form:
     <ol>
         <li>Although not as strictly followed as for braces, an attempt 
         should be made to align block-level elements. For such an element, one 
         should often place the starting and ending tag on a line by itself and nest
         the contents by <b>four spaces</b>, if possible. This is not 
         required if the indentation level would be too deep to easily read the line.
         Inline elements can be more free-form:
         </li>
     </ol>
    
    Notice we indent for the ol tag. Since starting text on a separate line for an li tag might affect appearance, adding a space to the output, we don't do it. We do, however, put the close tag on a line by itself. In the above the b tag is inlined.
  10. Here are some examples of splitting long lines in HTML:
     <-- Long open tags -->
    
     <-- case where content start and end spacing affects output -->
     <tag attr1="value1" attr2="value2"
         attr3="value3">contents</tag>
    
     <-- or, if it doesn't affect output: -->
     <tag attr1="value1" attr2="value2"
         attr3="value3">
         contents
     </tag>
    
     <-- Long urls should be split near '/', '?', '&'. Most browsers
         ignore a carriage return (without spaces) at such places in a url
     -->
     <a href="http://www.cs.sjsu.edu/faculty/
     pollett/masters/Semesters/Fall10/vijaya/index.shtml">Vijaya Pamidi's
     master's pages</a>
    
  11. Urls appearing in HTML should make use of the HTML entity for ampersand: & rather than just a & . Browsers will treat these the same and this can often help with validation issues.
Return to table of contents .

SQL

SQL in Yioop typically appears embedded in PHP code. This section briefly describes some minor issues with the formatting of SQL, and, in general, how Yioop code should interact with databases.
  1. Except in subclasses of DatasourceManager, Yioop PHP code should not directly call native PHP database functions. That is, functions with names beginning with db2_, mysql_, mysqli_, pg_, orcl_, sqlite_, etc., or similar PHP classes. A DatasourceManager object exists as the $db field variable of any subclass of Model.
  2. SQL should not appear in Yioop in any functions or classes other than subclasses of Model.
  3. SQL code should be in uppercase. An example PHP string of SQL code might look like:
     $sql = "SELECT LOCALE_NAME, WRITING_MODE ".
         " FROM LOCALE WHERE LOCALE_TAG = ?";
    
  4. New tables names and field names created for Yioop should also be uppercase only.
  5. Multi-word names should be separated by an underscore: LOCALE_NAME, WRITING_MODE, etc.
  6. New tables added to the Yioop should maintain its BCNF normalization. Denormalization should be avoided.
  7. Yioop's DatasourceManager class does have a facility for prepared statements. Using prepared statements should be preferred over escaping query parameters. Below is exampled of prepared statements in Yioop called from a model:
     $sql = "INSERT INTO CRAWL_MIXES VALUES (?, ?, ?, ?)";
     $this->db->execute($sql, [$timestamp, $mix['NAME'],
         $mix['OWNER_ID'], $mix['PARENT']]);
    
    Notice how the values that are to be filled in for the ? are listed in order in the array. execute caches the last statement it has seen, so internally if you call $db->execute twice with the same statement it doesn't do the lower level prepare call to the database the second time. You can also use named parameters, as in the following example:
     $sql = "UPDATE VISITOR SET DELAY=:delay, END_TIME=:end_time,
         FORGET_AGE=:forget_age, ACCESS_COUNT=:account_count
         WHERE ADDRESS=:ip_address AND PAGE_NAME=:page_name";
     $this->db->execute($sql, [
         ":delay"=>$delay, ":end_time" => $end_time,
         ":forget_age" => $forget_age,
         ":account_count" => $access_count,
         ":ip_address" => $ip_address, ":page_name" => $page_name]);
    
  8. In the rare case where a non-prepared statement is used, strings should be properly escaped usingDatasourceManager::escape_string. For example,
     $sql = "INSERT INTO LOCALE".
         "(LOCALE_NAME, LOCALE_TAG, WRITING_MODE) VALUES".
         "('".$this->db->escapeString($locale_name).
         "', '".$this->db->escapeString($locale_tag) .
         "', '".$this->db->escapeString($writing_mode)."')";
    
Return to table of contents .
     

Localization

Details on how Yioop can be translated into different languages can be found in the Yioop Localization Documentation . As a coder what things should be localized are given in the general considerations section of this document. In this section, we describe a little about what constitutes a good translation, and then talk a little about, as a coder, how you should add new strings to be localized. We also make some remarks on how localization patches should be created before posting them to the issue tracker. This section describes how Yioop should be localized. The seekquarry.com site is also localizable. If you are interested in translating the Yioop documentation or pages on seekquarry.com, drop me a line at: chris@pollett.org .
  1. It can take quite a long time to translate all the strings in Yioop. Translations of only some of the missing strings for some locale are welcome! Preference should be given to strings that an end-user is likely to see. In order of priority one should translate string ids beginning with search_view_, pagination_helper_, search_controller_, signin_element_, settings_view_, settings_controller_, web_layout_, signin_view_, static_view_, statistics_view_.
  2. For static pages, there are two versions -- those included with the Yioop download, and those on the the order of translation should be: privacy.thtml, bot.thtml, 404.thtml, and 409.thtml. For translations of the privacy statement for yioop.com, you should add a sentence saying the meaning of English statement takes precedence over any translations.
  3. Localization should be done by a native (or close to) speaker of the language Yioop is being translated to. Automated translations using things like Google Translate should be avoided. If used, such translations should be verified by a native speaker before being used.
  4. There are three main kinds of text which might need to be localized in Yioop: static strings, dynamic strings, and static pages.
  5. Text that has the potential to be output by the Yioop web interface should only appear in views, elements, helpers, layouts, or controllers. Controllers should only pass the string to be translated to a view, which in turn outputs it; rather than directly output it.
  6. If you need Javascript to output a translatable string, use a PHP controller to output a Javascript variable into $data['SCRIPT'], then have your Javascript make use of this variable to provide translation on the client. External .js files should not contain PHP code. An example of using this mechanism is given by the files mix.js and admin_controller.php's editMix member function.
  7. String ids should be all lowercase, with an underscore used to separate words. They should follow the convention: file_name_approximate_english_translation. For example, signin_view_password is a string id which appears in the views/SigninView.php file, and in English is translated as Password.
  8. Dynamic strings ids are string ids stored in the database and which may be added by administrators after downloading Yioop. String ids for these strings should all be in the format: db_use_case_translation. For example, db_activity_manage_locales or db_subsearch_images .
  9. All suggested localizations posted to the issue tracker should be UTF-8 encoded.
  10. If the only string ids you have translated are static ones, you can just make a new issue in the issue tracker and post the relevant configure.ini file. These files should be located in the Yioop Work Directory/locale/locale_in_question . Ideally, you should add strings through Manage Locales, which will modify this file for you.
  11. For dynamic string translations just cut-and-paste the relevant line from Edit Locales into a new note for your issue.
  12. Wiki pages for the Public and Help groups are also useful to have translated. Again, how to set up localizations of wiki pages is described in Yioop Localization Documentation . Once you have translated these pages on your local system, you can run the script configs/ExportPublicHelpDb.php to export this information to a file in APP_DIR/configs/PublicHelpPages.php which you can send.
Return to table of contents .

Code-base Organization

This section describes what code should be put where when writing new code for Yioop. It can serve as a rough guide as to where to find stuff. Also, coding organization is used to ensure the security of the overall Yioop software. Some of the material in this section overlaps with what is described in the Summary of Files and Folders and the Building a Site using Yioop as a Framework sections of the main Yioop documentation. All folder paths listed in this section are with respect to the Yioop INSTALL_DIR/src folder, or, in the case of tests, just the INSTALL_DIR folder.
  1. There are two main categories of apps in Yioop: the command line tools and programs, and the Yioop web app.
  2. Core libraries common to both kinds of apps should be put in the library folder. One exception to this are subclasses of DatasourceManager. DatasourceManager has database and filesystem functions which might be useful to both kinds of apps. It is contained in models/datasources. The easiest way to create an instance of this class is with a line like:
            $model = new Model(); // $model->db will be a DatasourceManager
    
  3. Some command-line programs such as executables/Fetcher.php and executables/QueueServer.php communicate with the web app either through curl requests or by file-based message passing. As a crude way to the check integrity of these messages as well as to reduce the size of serializations of the messages sent, the CrawlConstants interface defines a large number of shared class constants. This interface is then implemented by all classes that have need of this kind of message passing. CrawlConstants is defined in the file library/CrawlConstants.php .
  4. Command-line tools useful for general Yioop configuration together with the Yioop configuration files Config.php and LocalConfig.php should be put in the configs folder. Some examples are: ConfigureTool.php and Createdb.php .
  5. All non-configuration command-line tools should be in the executables folder.
  6. Example scripts such as the file search.php which demonstrates the Yioop search API should be in the examples folder.
  7. External Javascripts should be in the scripts folder, CSS should be the css folder, images should be in the resources folder, and sqlite3 databases in the data folder.
  8. Code (PHP and Javascript) related to a particular locale should be in the folder locale/locale-tag/resources. Examples of this are the files: locale/en_US/resources/locale.js and locale/en_US/resources/Tokenizer.php .
  9. Unit tests and coding experiments (the latter might test different aspects about speed and memory usage of PHP or Javascript constructs) should be in the tests folder. Auxiliary files to these tests and experiments should be put in tests/test_files.
  10. Unit tests should be written for any new lib folder files. Unit tests should be a subclass of UnitTest which can be found in library/UnitTest.php. The file name for a unit test should end in Test.php to facilitates it detection by tests/index.php which is used to run the tests. As much as possible unit tests should be written for executables folder programs and the web app as well.
  11. Command-line tools should have a check that they are not being run from the web such as:
     // if the command-line program does not have a unit test
     if(php_sapi_name() != 'cli') {echo "BAD REQUEST"; exit();}
     
     // if the command-line program has a unit test
     if (!defined("seekquarry\\yioop\\configs\\UNIT_TEST_MODE")) {
         if (php_sapi_name() != 'cli') {echo "BAD REQUEST"; exit();}
     }
    
  12. Files other than command line programs, ./index.php, and ./tests/index.php should not define the UNIT_TEST_MODE constants. All code in non-command line programs should be in class, interface, or function definitions. I.e., they should have no globally executing statements.
  13. The entry points into the web app should output the HTTP header:
     header("X-FRAME-OPTIONS: DENY");
    
    to try to prevent clickjacking.
  14. The only file to specify the autoloader for Yioop is configs/Config.php. It also defines important namespace level constants. It is require'd by both library/Utility.php and library/LocaleFunctions.php. Requiring any one of these files thus specifies the autoloader and important namespace constants. The base classes Controller, Model, View, etc., each require at least one of these files, so subclasses shouldn't.
  15. The Yioop web app has the following kinds of files: controllers, models, views, (these three are main three); and components, element, helpers, and layouts (lesser). These should be put respectively into the folders: controllers, models, views, controllers/components, views/elements, views/helpers, views/layouts. Filenames should for these files should end with its type: i.e., a view should end with View.php, for example, MyView.php .
  16. A view roughly corresponds to one web page, a layout is used to render common page headers and footers for several views, an element is used for a relatively static portion of a web page which might appear in more than one view, and a helper is used to dynamically render a web page element such as a select tag according to passed PHP variables.
  17. Views, elements, and layouts should contain minimal PHP and be mostly HTML. In these classes for, while, etc. loops should be avoided. PHP in these classes should be restricted to simple conditionals and echos of $data variable fields.
  18. Control logic involving conditionals, loops, etc. should be put in controllers or components. Components are collections of related methods which might be used by several controllers. The controller's static field $component_activities is used to define which components live on a controller and what activities from that component are allowed. A component has a $parent field that allows access to the controller it currently lives on.
  19. In the web app, only models should access the file system or a database.
  20. Variables whose values come from a web client should be cleaned before used by a view or a model. Subclasses of Controller have a clean() member function for this purpose. Further DatasourceManager's have an escapeString method which should be used on string before inserting them into a database in a Model.
  21. Models, views, elements, helpers, and layouts should not use the $_GET, $_POST, $_REQUEST super-globals. Controllers should not use $_GET and $_POST, at most they should use $_REQUEST. This helps facilitates changing whether HTTP GET or POST is used -- also, using the same variable name for both a GET and POST variable is evil -- this restriction may (or may not) help in catching such errors.
  22. For controllers which use the $_SESSION super-global, the integrity of the session against cross-site request forgery should be checked. This should be done in the processRequest method using code like:
     if(isset($_SESSION['USER_ID'])) {
         $user = $_SESSION['USER_ID'];
     } else {
         $user = $_SERVER['REMOTE_ADDR'];
     }
     
     $data[CSRF_TOKEN] = $this->generateCSRFToken($user);
     $token_okay = $this->checkCSRFToken(CSRF_TOKEN, $user);
     if($token_okay) {
         //now can do stuff
     }
    
  23. When creating a new release of Yioop, one should check if any required database or locale changes were made since the last version. If database changes have been made, then configs/Createdb.php should be updated. Also library/UpgradeFunctions.php should have a new upgradeDatabaseVersion function added. If locale changes need to be pushed from BASE_DIR/locale files to WORK_DIRECTORY/locale files when people upgrade, then one should change the version number on the view_locale_version string id. i.e., view_locale_version0 as a string id might become view_locale_version1. This string id is in views/View.php. It is not actually output anywhere to the UI -- it is used only for this purpose. A number of variables control whether client-side, HTML5 localStorage related to the previous release will still work with the new release. If it won't work, then this version number should be updated. An example of such a variable is SUGGEST_VERSION_NO in suggest.js.
Return to table of contents .

Issue Tracking/Making Patches/Commit Messages

In this section we discuss the Yioop issue tracker and discuss using the git version control system to make and apply patches for Yioop.
  1. If you would like to contribute code to Yioop, but don't yet have an account on the issue tracker, you can sign up for an account.
  2. After one has an account and is logged in, one can click the Report Issue link to report an issue. Be sure to fill in as many report fields and give as much detail as possible. In particular, you should select a Product Version.
  3. The Upload File fieldset lets you upload files to an issue and the Add Note fieldset allows you to add new notes. This is where you could upload a patch. By default, a new account is a Reporter level account. This won't let you set people to moniter (get email about) the issue besides yourself. However, the administrator will be aware the issue was created.
  4. A developer level account will allow you to change the status of issues, update/delete issues, set who is monitoring an issue, and assign issues to individuals. This can be done through the fieldset just beneath Attached Files.
  5. Information about Git, Git clients, etc. can be obtained from: http://git-scm.com/. Here we talk about a typically workflow for coding Yioop using Git.
  6. After installing git, make sure to configure your user name and email address:
     % git config --global user.name "Chris Pollett"
     % git config --global user.email "chris@pollett.org"
    
    You should of course change the values above to your name and email. To see your current configuration settings you can type:
     % git config -l
    
    If you want to remove any settings you can type:
     % git config --unset some.setting.you.dont.want
    
    Setting the user name and email will ensure that you receive credit/blame for any changes that end up in the main git repository. To see who is responsible for what lines in a file one can use the git blame command. For example:
     % git blame yioopbar.xml
     |chris-polletts-macbook-pro:yioop:526>git blame yioopbar.xml
     git blame yioopbar.xml
     ad3c397c (Chris Pollett 2010-12-28 00:27:38 -0800  1) <?xml version="1.0" e
     ad3c397c (Chris Pollett 2010-12-28 00:27:38 -0800  2) <OpenSearchDescriptio
     ad3c397c (Chris Pollett 2010-12-28 00:27:38 -0800  3) <ShortName>Yioop<
     ad3c397c (Chris Pollett 2010-12-28 00:27:38 -0800  4) <Description>Quickly
     ad3c397c (Chris Pollett 2010-12-28 00:27:38 -0800  5) <InputEncoding>UTF-8
     ad3c397c (Chris Pollett 2010-12-28 00:27:38 -0800  6) <Image width="16" hei
     774eb50d (Chris Pollett 2012-12-31 10:47:57 -0800  7) <Url type="text/html"
     774eb50d (Chris Pollett 2012-12-31 10:47:57 -0800  8)     template="http://
     ad3c397c (Chris Pollett 2010-12-28 00:27:38 -0800  9) </Url>
     774eb50d (Chris Pollett 2012-12-31 10:47:57 -0800 10) </OpenSearchDescripti
    
  7. To make a new copy of the most recent version of Yioop one can run the git clone command:
     % git clone https://seekquarry.com/git/yioop.git yioop
    
    This would create a copy of the Yioop repository into a folder yioop in the current directory. Thereafter, to bring this copy up to date with the most recent version of yioop one can issue the command:
     % git pull
    
  8. Once one has a git clone of Yioop -- or done a git pull of the most recent changes to Yioop -- one can start coding! After coding a while you should run git status to see what files you have changed. For example,
     % git status
     # On branch master
     # Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
     #
     # Untracked files:
     #   (use "git add <file>..." to include in what will be committed)
     #
     #    tmp.php
     nothing added to commit but untracked files present (use "git add" to track)
    
    This says there has been one commit to the main repository since your clone / last git pull. It also says we could bring things up to date by just doing a git pull. In this case, however, it says that there was an untracked file in the repository. If this file was a file we made with the intention of adding it to Yioop, we should type git add to add it. For example,
     % git add tmp.php
     
     
     Now we could try to do a git pull. Suppose we get the message...
     
     Updating e3e4f20..a9a8ed9
     error: Your local changes to the following files would be overwritten by merge:
         tmp.php
     Please, commit your changes or stash them before you can merge.
     Aborting
    
    What this means is that someone else has also added tmp.php and there are conflicts between these two versions. To merge these two versions, we first commit our version:
     % git commit -a -m "Fixes Issue 987, Yioop needs a tmp.php file, a=chris"
     [master 3afe055] Fixes Issue 987, Yioop needs a tmp.php file, a=chris
      1 file changed, 4 insertions(+)
      create mode 100644 tmp.php
    
    The option -a tells git to put in the commit all changes done to staged files (those that we have git add'd) since the last commit. The option -m is used to give an inline message. The general format of a of such a message in Yioop is: which issue number in the issue tracker is being fixed, a brief English summary of that issue, and under whose authority the commit is being done. This last will be in the format a=chris where a means approved and the person who approved is of sufficient seniority to commit unreviewed things or in the format r=someone, where someone is the person asked in the issue to review your commits before they are pushed. Often for administrator commits, there won't be an associated issue tracking issue, in which case the format reduces to: some useful English description of the change, a=username of administrator. Now that we have done the above commit, we can try again to do a git pull:
     % git pull
     Auto-merging tmp.php
     CONFLICT (add/add): Merge conflict in tmp.php
     Automatic merge failed; fix conflicts and then commit the result.
     %cat tmp.php
     cat tmp.php
     <?php
     <<<<<<< HEAD
      echo "hello";
      echo "good bye";
     =======
     >>>>>>> a9a8ed990108598d06334e29c0eb37d98f0845aa
     ?>
    
    The listing of the tmp.php file above has blocks of the form: <<<<<<< HEAD, =======, >>>>>>> a9a8ed990108598d06334e29c0eb37d98f0845aa. In this case, there is only one such block, in general, there could be many. The stuff before the ======= in the block is in the local repository, the stuff after the ======= is in the remote repository. So in the local copy, there are the two lines:
     echo "hello";
     echo "good bye";
    
    not in the remote repository. On the other hand, there is nothing in the remote repository not in the local copy. So we could fix this conflict by editing this block to look like:
     <?php
      echo "hello";
      echo "good bye";
     ?>
    
    In general, we should fix each conflict block if there is more than one. Conflicts can also be in more than one file, so we could have to fix each file with conflicts. Once this is done, to tell git we have resolved the conflict, we can type:
     % git add tmp.php
     % git commit
     [master e5ebf9f] Merge branch 'master' of https://seekquarry.com/git/yioop
    
    Here we didn't use -m, so we were dropped into the vi text editor, where we left the default commit message. Now we can go back to editing our local copy of Yioop. If we do a git pull at this point, we will get the message: "Already up-to-date."
  9. The "opposite command" to git pull is git push. Most casual developers for Yioop don't have push privileges on the main Yioop repository. If one did, a possible development workflow would be: Pull the master copy of Yioop to a local branch, make your changes and post a patch to the Bug/Issue in question on the issue tracker asking someone to review it (probably, the administrator, which is me, Chris Pollett). The reviewer gives a thumbs up or down. If it is a thumbs up, you push your changes back to the master branch. Otherwise, you revise you patch and try again. To configure git so git push works you can either make a ~/.netrc file with
     machine seekquarry.com
     login <username>
     password <password>
    
    in it, chmod it to 600, and type:
     % git config remote.upload.url https://seekquarry.com/git/yioop.git
    
    or you can just type the command:
     % git config remote.upload.url \
         https://<username>@seekquarry.com/git/yioop.git
    
    After this, you should be able to use the command:
     % git push upload master
    
    This pushes your local changes back to the repository. In the second method, you will be prompted for your password. Another common setting that you might to change is http.sslVerify. If you are getting error messages such as
     error: server certificate verification failed. CAfile:
     /etc/ssl/certs/ca-certificates.crt CRLfile: none
     while accessing https://seekquarry.com/git/yioop.git/info/refs
     
     
     you might want to use the command:
     % git config --global --add http.sslVerify false
    
  10. In the workflow above, the changes we make to our local repository should be reviewed before we do a push back to the Yioop repository. To do this review, we need to make a patch, upload the patch to the issue tracker, and add someone to this issue monitor list who could review it, asking them to do a review. These last two steps require the user to have at least a developer account on the issue tracker. Anyone who registers for the issue tracker gets initially a reporter account. If you would like to code for Yioop and have already made a patch, you can send an email to chris@pollett.org to request your account to be upgraded to a developer account. New developers do not get push access on the Yioop repository. For such a developer, the workflow is create a patch, post it to an issue on the issue tracker, get it approved by an administrator reviewer, then the reviewer pushes the result to the main Yioop repository.
  11. After coding, but before making a patch you should run executables/CodeTool.php to remove any stray tab characters, or spaces at the end of lines. This program can be run either on a single file or on a folder. For example, one could type:
     % php executables/CodeTool.php clean tmp.php
    
    This assumes you were in the Yioop base directory and that was also the location of tmp.php. You should also run the command:
     % php executables/CodeTool.php longlines tmp.php
    
    to check for lines over 80 characters.
  12. To make a patch, we start with an up-to-date copy of Yioop obtained by either doing a fresh clone or by doing a git pull. Suppose we create a couple new files, add them to our local repository, do a commit, delete one of these files, make a few more changes, and commit the result. This might look on a Mac or Linux system like:
     % ed test1.php
     test1.php: No such file or directory
     a
     <?php
     ?>
     .
     wq
     9
     % ed test2.php
     test2.php: No such file or directory
     a
     <?php
     ?>
     .
     wq
     9
     % git add test1.php
     % git add test2.php
     % git commit -a -m "Adding test1.php and  test2.php to the repository"
     [master 100f787] Adding test1.php and test2.php to the repository
      2 files changed, 4 insertions(+)
      create mode 100644 test1.php
      create mode 100644 test2.php
     % ed test1.php
     9
     1
     <?php
     a
         phpinfo();
     .
     wq
     24
     % git rm test2.php
     rm 'test2.php'
     % ls
     ./        README*        data/        locale/        search_filters/
     ../        bin/        error.php*    models/        test1.php
     .DS_Store*    blog.php*    examples/    my.patch    tests/
     .git/        bot.php*    extensions/    privacy.php*    views/
     .gitignore    configs/    favicon.ico    resources/    yioopbar.xml
     INSTALL*    controllers/    index.php*    robots.txt
     LICENSE*    css/        lib/        scripts/
     % git commit -a -m "Adding phpinfo to test1.php, removing test2.php"
     [master 7e64648] Adding phpinfo to test1.php, removing test2.php
      2 files changed, 1 insertion(+), 2 deletions(-)
      delete mode 100644 test2.php
    
    Presumably, you will use a less ancient editor than ed. ed though does have the virtue of not clearing the screen, making it easy to cut and paste what we did. We now want to make a patch consisting of all the commits since we did the git pull. First, we get the name of the commit before we started modifying stuff by doing git log -3 to list out the information about the last three commits. If you had done more commits or less commits since the git pull then -3 would be different. We see the name is e3e4f20674cf19cf5840f431066de0bccd1b226c. The first eight or so characters of this uniquely identify this commit, so we copy them. To make a patch with git, one uses the format-patch command. By default this will make a separate patch file for each commit after the starting commit we choose. To instead make one patch file we use the --stdout option and redirect the stream to my.patch. We can use the cat command to list out the contents of the file my.patch. This sequence of commands looks like the following...
     % git log -3
     commit 7e646486faa35f69d7322a8e4fca12fb6b457b8f
     Author: Chris Pollett <chris@pollett.org>
     Date:   Tue Jan 1 17:32:00 2013 -0800
     
         Adding phpinfo to test1.php, removing test2.php
     
     commit 100f7870221d453720c90dcce3cef76c0d475cc8
     Author: Chris Pollett <chris@pollett.org>
     Date:   Tue Jan 1 16:35:02 2013 -0800
     
         Adding test1.php and test2.php to the repository
     
     commit e3e4f20674cf19cf5840f431066de0bccd1b226c
     Author: Chris Pollett <chris@pollett.org>
     Date:   Tue Jan 1 15:48:34 2013 -0800
     
         modify string id in settings_view, remove _REQUEST variable from 
     machinelog_element, a=chris
     % git format-patch e3e4f2067 --stdout > my.patch
     % cat my.patch
     From 100f7870221d453720c90dcce3cef76c0d475cc8 Mon Sep 17 00:00:00 2001
     From: Chris Pollett <chris@pollett.org>
     Date: Tue, 1 Jan 2013 16:35:02 -0800
     Subject: [PATCH 1/2] Adding test1.php and test2.php to the repository
     
     ---
      test1.php |    2 ++
      test2.php |    2 ++
      2 files changed, 4 insertions(+)
      create mode 100644 test1.php
      create mode 100644 test2.php
     
     diff --git a/test1.php b/test1.php
     new file mode 100644
     index 0000000..acb6c35
     --- /dev/null
     +++ b/test1.php
     @@ -0,0 +1,2 @@
     +<?php
     +?>
     diff --git a/test2.php b/test2.php
     new file mode 100644
     index 0000000..acb6c35
     --- /dev/null
     +++ b/test2.php
     @@ -0,0 +1,2 @@
     +<?php
     +?>
     --
     1.7.10.2 (Apple Git-33)
     
     
     From 7e646486faa35f69d7322a8e4fca12fb6b457b8f Mon Sep 17 00:00:00 2001
     From: Chris Pollett <chris@pollett.org>
     Date: Tue, 1 Jan 2013 17:32:00 -0800
     Subject: [PATCH 2/2] Adding phpinfo to test1.php, removing test2.php
     
     ---
      test1.php |    1 +
      test2.php |    2 --
      2 files changed, 1 insertion(+), 2 deletions(-)
      delete mode 100644 test2.php
     
     diff --git a/test1.php b/test1.php
     index acb6c35..e2b4c37 100644
     --- a/test1.php
     +++ b/test1.php
     @@ -1,2 +1,3 @@
      <?php
     +    phpinfo();
      ?>
     diff --git a/test2.php b/test2.php
     deleted file mode 100644
     index acb6c35..0000000
     --- a/test2.php
     +++ /dev/null
     @@ -1,2 +0,0 @@
     -<?php
     -?>
     --
     1.7.10.2 (Apple Git-33)
    
  13. One should always list out the patch as we did above before posting it to the issue tracker. It can happen that we accidentally find that we have more things in the patch than we should. Also, it is useful to do one last check that the Yioop coding guidelines seem to be followed within the patch.
  14. The last step before uploading the patch to the issue tracker is to just check that the patch in fact works. To do this make a fresh clone of Yioop. Copy my.patch into your clone folder. To see what files the patch affects, we can type:
     % git apply --stat my.patch
      test1.php |    2 ++
      test2.php |    2 ++
      test1.php |    1 +
      test2.php |    2 --
      4 files changed, 5 insertions(+), 2 deletions(-)
    
    Since there are two concatenated patches in my.patch, it first lists the two files affected by the first patch, then the two files affected by the second patch. To do a check to see if the patch will cause any problems before applying it, one can type:
     % git apply --check my.patch
    
    Finally, to apply the patch we can type:
     % git am --signoff <  my.patch
     Applying: Adding test1.php and test2.php to the repository
     Applying: Adding phpinfo to test1.php, removing test2.php
         </pre>
         The am says apply from a mail, the --signoff option says to write a
         commit message with your email saying you approved this commit. From
         the above we see each patch within my.patch was applied in turn. To
         see what this signoff looks like, we can do:
         <pre>
     commit aca40730c41fafe9a21d4f0d765d3695f20cc5aa
     Author: Chris Pollett <chris@pollett.org>
     Date:   Tue Jan 1 17:32:00 2013 -0800
     
         Adding phpinfo to test1.php, removing test2.php
     
         Signed-off-by: Chris Pollett <chris@pollett.org>
     
     commit d0d13d9cf3059450ee6b1b4a51db0d0fae18256c
     Author: Chris Pollett <chris@pollett.org>
     Date:   Tue Jan 1 16:35:02 2013 -0800
     
         Adding test1.php and test2.php to the repository
     
         Signed-off-by: Chris Pollett <chris@pollett.org>
     
     commit e3e4f20674cf19cf5840f431066de0bccd1b226c
     Author: Chris Pollett <chris@pollett.org>
     Date:   Tue Jan 1 15:48:34 2013 -0800
     
         modify string id in settings_view, remove _REQUEST variable from
     machinelog_element, a=chris
    
    At this point the patch seems good to go, so we can upload it to the issue tracker!
Return to table of contents .

New Version Quality Assurance Checklist

The following should be check before creating a new release of Yioop:
  1. All unit tests pass.
  2. Included sqlite database default.db is up-to-date.
  3. Install guides are up to date and installation can be performed as described for each of the major platforms (Linux variants, Macos, Windows, HHVM).
  4. Upgrade functions successfully upgrade Yioop from last version. Upgrade functions need only be written from the last official release to the current official release.
  5. Yioop can perform a regular and archive crawl on each of the platforms for which an install guide has been made.
  6. Each kind of archive iterator has been tested on the development platform to be still working.
  7. Multi-queue server crawls should be tested. Mirror and Media updater processes should be tested.
  8. Documentation reflects changes since last version of Yioop, screenshots have been updated.
  9. Source code documentation has been updated. The current command used to do this is
     phpdoc -d ./yioop -t ./yioop_docs --ignore="work_directory/*,LocalConfig.php,*/vendor/*" \
     --sourcecode --title="Yioop_Vversion_Source_Code_Documentation"
    
    This should be executed from one directory up from the Yioop source code
  10. Each admin panel activity and each setting within each activity works as described.
  11. Web app still appears correctly for major browsers: Chrome, Edge, Firefox, Internet Explorer, Safari released in the last two years.
Return to table of contents .