Installation Guides

There are two main ways to install Yioop: Using its own internal webserver or under an existing web server such as Apache. Yioop will probably run faster using its own internal web server; however, running under a traditional web server is probably slightly more stable. Below, descriptions of how to install in either setting are given for a variety of operating systems.

Demo Install Video
Install Yioop Without a Web Server
Install Yioop Under a Web Server
CPanel
Systems with Multiple Queue Servers

Demo Install Video

A half hour demo of installing Yioop is available at yioop.com: Yioop Install Demo. On the Yioop.com website the Yioop Tutorials Wiki has video tutorials for several of Yioop's features. This wiki also illustrates the ability of Yioop software to do video streaming.

Install Yioop Without a Web Server

The main idea for all of the instructions below is to first obtain a version of PHP and configure it so that it can run Yioop, then run Yioop. If you already have PHP installed on your machine you might be able to just skip to the steps involving running Yioop on your machine.

Windows

Some features of Yioop (versioning of wiki resource files), rely on so-called file system hard links. These do not work on Windows, so these features won't work. This will not affect crawling, it only affects the ability of getting back old wiki pages in the group wiki system in Yioop.
Right clicking on the Windows Icon in the Windows 11 Task Bar, select Windows Terminal (admin) to open Powershell as an administrator.

Install Chocolatey Package Manager:

 Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = `
 [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object `
 System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))

Restart Powershell as administrator.
Next install PHP, Sqlite, and the Pulsar Editor using Chocolatey, by typing from the power shell prompt:
```
 choco install php
 choco install sqlite
 choco install pulsar
```
You should now have on your Desktop a shortcut to the Pulsar editor. Click on it and within the editor open the file:
```
 C:\tool\php81\php.ini
```

Locate using find in the editor the lines containing the following and remove the leading semi-colons:

 extension=bz2
 extension=curl
 extension=fileinfo
 extension=gd
 extension=intl
 extension=mbstring
 extension=exif
 extension=openssl
 extension=pdo_sqlite
 extension=sqlite3

Save C:\tool\php81\php.ini
Download Yioop Unzip it into
```
 C:\yioop
```
From Powershell type:
```
 cd C:\yioop
 php index.php terminal
```
Yioop should now be running on port 8080. If you want Yioop to run on a different port, in the above you could have typed:
```
 php index.php terminal some_other_port
```
If you later want to stop Yioop just hit CTRL-C.
In a browser, go to the page http://localhost:8080/ . It might tell you that you need to restart Yioop on a profile change. If so, run the command you just typed again, and visit the same url. You should see the default search landing page for Yioop. Click on the hamburger menu and then Signin and fill in the form as:
```
 Login: root
 Password: (leave blank)
```
Click on the hamburger menu in the top left corner of the page, then select the Configure activity and alter the following settings:
```
 Search Engine Work Directory: (don't change)
 Default Language: (choose the language you want, or for now leave as English)
 Debug Display: (don't change)
 Search Access: (don't change)
 Crawl Robot Name: TestBot
 Robot Description: This bot is for test purposes. It respects robots.txt
```
The crawl robot name is what will appear together with a url to a bot.php page in web server log files of sites you crawl. The bot.php page will display what you write in robot description. This should give contact information in case your robot misbehaves. Obviously, you should customize the above to what you want to say.
Again using the hamburger menu, go to Manage Crawls. Click on the + icon next to the the Crawls Heading. Then click on the Options link next to the start button to set up where you want to crawl. After saving your option selections, click the X in the top right of that page to return to the Manage Crawls page. Type in a name for the crawl and click the Start crawl button.
Let it crawl for a while, until you see the Total URLs Seen > 1.
Then click Stop Crawl and wait for the crawl to appear in the previous crawls list. Set it as the default crawl. You should be able to search using this index.

macOS

Install the Homebrew package manager. Open a terminal window and type:

 /bin/bash -c "$(curl -fsSL \
 https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

After it finishes install, follow the brew instructions to run the two commands

 echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> /Users/your_username/.zprofile
 eval "$(/opt/homebrew/bin/brew shellenv)"

Install php. Type from the command line:
```
 brew install php
```
In what follows some of the paths will look different on an Intel based Mac versus an Apple Silicon based Mac. Namely, a path beginning with /opt/homebrew on an Apple Silicon Mac will begin with /usr/local/ on an Intel based Mac.
Unlike the Windows php description earlier, I did not need to enable any php.ini extensions to get Yioop to work (probably compiled in).
Download Yioop Unzip it onto your Desktop
From the terminal type:
```
 cd ~/Desktop/yioop
 php index.php
```
Yioop should now be running on port 8080. To stop it you could type:
```
 php index.php stop
```
If you want Yioop to run on a different port, in the above you could have typed:
```
 php index.php some_other_port
```
In a browser, go to the page http://localhost:8080/ . You should see the default search landing page for Yioop. Click on the hamburger menu and then Signin and fill in the form as:
```
 Login: root
 Password: (leave blank)
```
Now click on the hamburger menu in the top left corner of the page, then select the Configure activity and alter the following settings: and alter the following settings:
```
 Search Engine Work Directory: (don't change)
 Default Language: (choose the language you want, or for now leave as English)
 Debug Display: (don't change)
 Search Access: (don't change)
 Crawl Robot Name: TestBot
 Robot Description: This bot is for test purposes. It respects robots.txt
```
The crawl robot name is what will appear together with a url to a bot.php page in web server log files of sites you crawl. The bot.php page will display what you write in robot description. This should give contact information in case your robot misbehaves. Obviously, you should customize the above to what you want to say.
Again using the hamburger menu, go to Manage Crawls. Click on the + icon next to the the Crawls Heading. Then click on the Options link next to the start button to set up where you want to crawl. After saving your option selections, click the X in the top right of that page to return to the Manage Crawls page. Type in a name for the crawl and click the Start crawl button.
Let it crawl for a while, until you see the Total URLs Seen > 1.
Then click Stop Crawl and wait for the crawl to appear in the previous crawls list. Set it as the default crawl. You should be able to search using this index.

Ubuntu/Debian Linux

The instructions described here have been tested on Ubuntu 22.04 LTS.

Get PHP set-up by running the following commands as needed (you might have already done some). For Ubuntu 22.04 LTS type:

 sudo apt install curl
 sudo apt install php-bcmath
 sudo apt install php-cli
 sudo apt install php-curl
 sudo apt install php-gd
 sudo apt install php-intl
 sudo apt install php-mbstring
 sudo apt install php-sqlite
 sudo apt install php-xml
 sudo apt install php-zip

Download Yioop , unzip it into /var/www and use mv to rename the Yioop folder to yioop.
Start Yioop using its own web server:
```
 cd /var/www/yioop
 php index.php
```
this will run the web server on port 8080. To run on some other port:
```
 sudo php index.php some_other_port_number
```
In a browser, go to the page http://localhost:8080/ . You should see the default search landing page for Yioop. Click on the hamburger menu, then Signin and fill in the form as:
```
 Login: root
 Password: (leave blank)
```
Click on the hamburger menu in the top left corner of the page, then select the Configure activity and alter the following settings:
```
 Search Engine Work Directory: (don't change)
 Default Language: (choose the language you want, or for now leave as English)
 Debug Display: (don't change)
 Search Access: (don't change)
 Crawl Robot Name: TestBot
 Robot Description: This bot is for test purposes. It respects robots.txt
```
The crawl robot name is what will appear together with a url to a bot.php page in web server log files of sites you crawl. The bot.php page will display what you write in robot description. This should give contact information in case your robot misbehaves. Obviously, you should customize the above to what you want to say.
Again using the hamburger menu, go to Manage Crawls. Click on the + icon next to the the Crawls Heading. Then click on the Options link next to the start button to set up where you want to crawl. After saving your option selections, click the X in the top right of that page to return to the Manage Crawls page. Type in a name for the crawl and click the Start crawl button.
Let it crawl for a while, until you see the Total URLs Seen > 1.
Then click Stop Crawl and wait for the crawl to appear in the previous crawls list. Set it as the default crawl. You should be able to search using this index.

Install Yioop Under a Web Server

Wamp

Some features of Yioop (versioning of wiki resource files), rely on so-called file system hard links. These do not work on Windows, so these features won't work. This will not affect crawling, it only affects the ability of getting back old wiki pages in the group wiki system in Yioop.
Download WampServer. These instructions were tested on the 64 bit version of WampServer 3.2.6 that came with PHP 7.4.26 as its default (although could switch to PHP 8.1). When I installed it, the installer let me choose the browser and editor that Wamp defaults to. I chose Firefox and Pulsar, as I already had these installed.
In the Wamp dock tool and navigate to wamp => php => version and select some version higher than 8 to be safe. I chose 8.1.0
Download Yioop Unzip it into
```
 C:\wamp64\www
```
Rename the downloaded folder yioop (so you should now have a folder C:\wamp64\www\yioop).
Make sure php curl is enabled. To do this use the Wamp dock tool and navigate to wamp => php => extension. Turn on curl. Similarly, make sure each of the extensions listed previously |Windows without a Webserver are on. This makes sure that curl is enabled in one of the php.ini files that WAMP uses...
Unfortunately, Wamp has several php.ini files for each version of PHP you can use. The one we just edited by doing this is in
```
 C:\wamp64\bin\apache\apache2.4.51\bin
```
this should symbolically link to the one in
```
 C:\wamp64\bin\php\php8.1.0
```
Depending on your version of Wamp the PHP version number may be different. To check if it is correctly edited, open this file in an editor and make sure the line:
```
 extension=php_curl.dll
```
doesn't have a semicolon in front of it.
Right click on Start in the task bar, choose setting => system => about => advanced system settings => advanced => environment variables => system variables =>path. Click edit and add to the path variable: C:\wamp64\bin\php\php8.1.0 Exit control panel, then re-enter to double check that path really was added to the end.
Restart your PC (just to be safe). Start Apache in Wampserver.
Go to http://localhost/yioop/ in a browser. You should see the default landing page for Yioop. Click Sign In and use the login: root and no password.
Now go to Yioop => Configure and alter the following settings:
```
 Search Engine Work Directory: (don't change)
 Default Language: (choose the language you want, or for now leave as English)
 Debug Display: (don't change)
 Search Access: (don't change)
 Crawl Robot Name: TestBot
 Robot Description: This bot is for test purposes. It respects robots.txt
```
The crawl robot name is what will appear together with a url to a bot.php page in web server log files of sites you crawl. The bot.php page will display what you write in robot description. This should give contact information in case your robot misbehaves. Obviously, you should customize the above to what you want to say.
Go to Manage Crawls. Click on the options to set up where you want to crawl. Type in a name for the crawl and click start crawl.
Let it crawl for a while, until you see the Total URLs Seen > 1. Then click Stop Crawl and wait for the crawl to appear in the previous crawls list. Set it as the default crawl. You should be able to search using this index.

macOS

Mac's come with the Apache webserver by default, however, since Macos Monterey, the php module has been deprecated. Instead, we will use homebrew to install Apache with the command.
```
 brew install httpd 
```
On Intel based Mac's the configuration files for Apache can be found in:
```
 /usr/local/etc/httpd/
```
on Apple Silicon based Mac's it will be located at:
```
 /opt/homebrew/etc/httpd/
```
The DocumentRoot directory used to serve results from are respectively
```
 /usr/local/var/www
```
and
```
 /opt/homebrew/var/www
```
Install php as we did in the Webserver setting:
```
 brew install php
```
Copy the yioop folder to the DocumentRoot folder above.
From the command line (Terminal), to start the web server type:
```
 brew services start httpd
```
Similarly, use
```
 brew services stop httpd
```
To see log message for how Apache is running one can look at the access_log and error_log files in
```
 /usr/local/var/log/httpd
```
or
```
 /opt/homebrew/var/log/httpd
```
By default, the Apache server listens on port 8080 for non-encrypted access, and 8443 for access using SSL encryption. This can be configured in the /opt/homebrew/etc/httpd/httpd.conf file by changing the line Listen 8080 to some other value (80 is the value, that browser checks if you don't give a port number). Similarly, one can edit the files
```
 /usr/local/etc/httpd/extra/httpd-ssl.conf
```
or
```
 /opt/homebrew/etc/httpd/extra/httpd-ssl.conf
```
to change the default port listened to for SSL encrypted traffic (brwoser by default look at port 443). If you switch the settings to 80 and 443, you need to restart Apache for them to have an effect. You would also need to restart Apache using sudo, to use these ports.
In the httpd.conf file uncomment the line (remove the #) so that url writing works:
```
 #LoadModule rewrite_module lib/httpd/modules/mod_rewrite.so
```

Add to the end of httpd.conf (use /usr/local on Intel-based Mac's)

 LoadModule php_module /opt/homebrew/opt/php/lib/httpd/modules/libphp.so
 Include /opt/homebrew/etc/httpd/extra/httpd-php.conf

Create the file /opt/homebrew/etc/httpd/extra/httpd-php.conf with the following in it:

 <IfModule php_module>
   <FilesMatch \.php$>
     SetHandler application/x-httpd-php
   </FilesMatch>
 
   <IfModule dir_module>
     DirectoryIndex index.html index.php
   </IfModule>
 </IfModule>

The default user that the Apache runs under is determined by the line
```
 User _www
```
You will need to either change the user to the user that you have been moving around files with or switch into the yioop folder and change the file ownerships:
```
 sudo chown -R _www .
```
Initially, set complete permissions on the following folders and files:
```
 sudo chmod -R 777 src/data
 sudo chmod -R 777 work_directory
 sudo chmod -R 777 src/configs/Config.php
```
After the install you can reduce the permissions on src/data and src/configs/Config.php
In a browser, go to the page http://localhost/yioop/ . You should see the default Yioop landing page. Click on the hamburger menu => Signin and then enter root as the login and leave the password blank. Now go to the hamburger menu => Configure and alter the following settings:
```
 Search Engine Work Directory: (don't change)
 Default Language: (choose the language you want, or for now leave as English)
 Debug Display: (don't change)
 Search Access: (don't change)
 Crawl Robot Name: TestBot
 Robot Description: This bot is for test purposes. It respects robots.txt
```
The crawl robot name is what will appear together with a url to a bot.php page in web server log files of sites you crawl. The bot.php page will display what you write in robot description. This should give contact information in case your robot misbehaves. Obviously, you should customize the above to what you want to say.
Again using the hamburger menu, go to Manage Crawls. Click on the + icon next to the the Crawls Heading. Then click on the Options link next to the start button to set up where you want to crawl. After saving your option selections, click the X in the top right of that page to return to the Manage Crawls page. Type in a name for the crawl and click the Start crawl button.
Let it crawl for a while, until you see the Total URLs Seen > 1.
Then click Stop Crawl and wait for the crawl to appear in the previous crawls list. Set it as the default crawl. You should be able to search using this index.

Ubuntu/Debian Linux

The instructions described here have been tested on Ubuntu 22.04 LTS.

Get PHP set-up by running the following commands as needed (you might have already done some). For Ubuntu 22.04 LTS type:

 sudo apt install curl
 sudo apt install php-bcmath
 sudo apt install php-cli
 sudo apt install php-curl
 sudo apt install php-gd
 sudo apt install php-intl
 sudo apt install php-mbstring
 sudo apt install php-sqlite
 sudo apt install php-xml
 sudo apt install php-zip

You should next install the web server if you have not already done so:

 sudo apt install apache2
 sudo apt install php
 sudo apt install libapache2-mod-php
 sudo a2enmod php
 sudo a2enmod rewrite

Switch into the directory where Apache configuration files can be found:
```
 cd /etc/apache2
```
The main configuration file is apache2.conf . The comments at the start of this file have very useful instructions concerning the overall layout of how Apache is configured under Ubuntu Linux. The file envvars defines constants used by this file and elsewhere.
In this file you can configure the default user and group that Apache runs under by editing the line: export APACHE_RUN_USER=www-data export APACHE_RUN_GROUP=www-data You will need to either change the user to the user that you have been moving around files with or switch into the yioop folder and change the file ownerships:
```
 sudo chown -R www-data .
```
Initially, set complete permissions on the following folders and files:
```
 sudo chmod -R 777 src/data
 sudo chmod -R 777 work_directory
 sudo chmod -R 777 src/configs/Config.php
```
After the install you can reduce the permissions on src/data and src/configs/Config.php
In a browser, go to the page http://localhost/yioop/ . You should see the default Yioop landing page. Click on the hamburger menu => Signin and then enter root as the login and leave the password blank. Now go to the hamburger menu => Configure and alter the following settings:
```
 Search Engine Work Directory: (don't change)
 Default Language: (choose the language you want, or for now leave as English)
 Debug Display: (don't change)
 Search Access: (don't change)
 Crawl Robot Name: TestBot
 Robot Description: This bot is for test purposes. It respects robots.txt
```
The crawl robot name is what will appear together with a url to a bot.php page in web server log files of sites you crawl. The bot.php page will display what you write in robot description. This should give contact information in case your robot misbehaves. Obviously, you should customize the above to what you want to say.
Again using the hamburger menu, go to Manage Crawls. Click on the + icon next to the the Crawls Heading. Then click on the Options link next to the start button to set up where you want to crawl. After saving your option selections, click the X in the top right of that page to return to the Manage Crawls page. Type in a name for the crawl and click the Start crawl button.
Let it crawl for a while, until you see the Total URLs Seen > 1.
Then click Stop Crawl and wait for the crawl to appear in the previous crawls list. Set it as the default crawl. You should be able to search using this index.
CPanel

Generally, it is not practical to do your crawling in a cPanel hosted website. However, cPanel works perfectly fine for hosting the results of a crawl you did elsewhere. Here we briefly described how to do this. In capacity planning your installation, as a rule of thumb, you should expect your index to be of comparable size (number of bytes) to the sum of the sizes of the pages you downloaded.

Download Yioop to your local machine.
In cPanel go to File Manager and navigate to the place you want on your server to serve Yioop from. Click upload and choose your zip file so as to upload it to that location.
Select the uploaded file and click extract to extract the zip file to a folder. Reload the page. Rename the extracted folder, if necessary.
For the rest of these instructions, let's assume it was mysite where the testing is being done. If at this point one browsed to:
```
 http://mysite.my/yioop/
```
you should see the landing page of your Yioop instance. You can sign in to this instance using the username root and a blank password.
Go to Manage account and give yourself a better login and password.
Go to Configure. Look at Component Check and make sure it says Checks Passed. Otherwise, you might have to ask your site provider to upgrade things.
cPanel machines tend to be underpowered so you might want to crawl elsewhere using one of the other install guides then upload the crawl results to your cPanel site.
After performing a crawl, go to Manage Crawls on the machine where you preformed the crawl. Look under Previous Crawls and locate the crawl you want to upload. Note its timestamp.
Go to THIS_MACHINES_WORK_DIRECTORY/cache . Locate the folder IndexDatatimestamp. where timestamp is the timestamp of the crawl you want. ZIP this folder.
In FileManager, under cPanel on the machine you want to host your crawl, navigate to
```
    yioop_data/cache.
```
Upload the ZIP and extract it.
Go to Manage Crawls on this instance of Yioop, locate this crawl under Previous Crawls, and set it as the default crawl. You should now be able to search and get results from the crawl.

You will probably want to uncheck Cache in the Page Options >Search Time activity as in this hosted setting it is somewhat hard to get the cache page feature (where it let's users see complete caches of web-page by clicking a link) of Yioop to work.

Systems with Multiple Queue Servers

This section assumes you have already successfully installed and performed crawls with Yioop in the single queue_server setting and have succeeded to use the Manage Machines to start and stop a queue_server and fetcher. If not, you should consult one of the installation guides above or the general Yioop Documentation .

Before we begin, what are the advantages in using more than one queue_server?

If the queue_servers are running on different processors then they can each be indexing part of the crawl data independently and so this can speed up indexing.
After the crawl is done, the index will typically exist on multiple machines and each needs to search a smaller amount of data before sending it to the name server for final merging. So queries can be faster.

For the purposes of this note we will consider the case of two queue servers, the same idea works for more. To keep things especially simple, we have both of these queue servers on the same laptop. Advantages (1) and (2) will likely not apply in this case, but we are describing this for testing purposes -- you can take the same idea and have the queue servers on different machines after going through this tutorial.

Download and install Yioop as you would in the single queue_server case. But do this twice. For example, on your machine, if you are running under a web server such as Apache, under its document root you might have two subfolders
```
 somewhere/yioop1
```
and
```
 somewhere/yioop2
```
each with a complete copy of yioop. If you are running Yioop using the built in web server rather than use Apache, make sure to start each instance with a different port number:
```
 php somewhere/yioop1/index.php 8080
 php somewhere/yioop1/index.php 8081
```
We will use the copy somewhere/yioop1 as an instance of Yioop with both a name server and a queue server; the somewhere/yioop2 will be an instance with just a queue server.
You should leave the work directories of these two instances at their default values. So work directories of these two instances should be different! For each crawl in the multiple queue server setting, each instance will have a copy of those documents it is responsible for. So if we did a crawl with timestamp 10, each instance would have a WORK_DIR/cache/IndexData10 folder and these folders would be disjoint from any other instance.
On the Configure activity for each instance, make sure under the Search Access fieldset Web, RSS, and API are checked.
Using the hamburger menu, next click on Server Settings. Make sure the name server and server key are the same for both instances. I.e., In the Name Server Set-up fieldset, one might set:
```
 Server Key:123
 Name Server URL:http://yioop_1_url/
```
The Crawl Robot Name should also be the same for the two instances, say:
```
 TestBotFeelFreeToBan
```
but we want the Robot Instance to be different, say 1 and 2.
Go to the Manage Machine activity for git/yioop1, which is the name server. Only the name server needs to manage machines, so we won't do this for somewhere/yioop2 (or for any other queue servers if we had them).
Add machines for each Yioop instance we want to manage with the name server. Use the wastebin icon to delete any machines currently present. Click on the + next to Machines to open the add machine form. In this particular case, fill out and submit the Add Machine form twice, the first time with:
```
 Machine Name:Local1
 Machine Url:http://yioop_1_url/
 Is Mirror: unchecked
 Has Queue Server: checked
 Num Fetchers: 1
```
the second time with:
```
 Machine Name:Local2
 Machine Url:http://yioop_2_url/
 Is Mirror: unchecked
 Has Queue Server: checked
 Num Fetchers: 1
```
The Machine Name should be different for each Yioop instance, but can otherwise be whatever you want. Is Mirror controls whether this is a replica of some other node -- I'll save that for a different install guide at some point. If we wanted to run more fetchers we could have chosen a bigger number for Num Fetchers (fetchers are the processes that download web pages).
After the above steps, there should be two machines listed under Machine Information. Click the On button on the queue server and the fetcher of both of them. They should turn green. If you click the log link you should start seeing new messages (it refreshes once every 30 seconds) after at most a minute or so.
At this point you are ready to crawl in the multiple queue server setting. You can use Manage Crawl to set-up, start and stop a crawl exactly as in the single queue_server setting.
Perform a crawl and set it as the default index. You can then turn off all the queue servers and fetchers in Manage Machines, if you like.
If you type a query into the search bar of the name server (somewhere/yioop1), you should be getting merged results from both queue servers. To check if this is working... Under configure on the name server (somewhere/yioop1) make sure Query Info is checked and that Use FileCache is not checked -- the latter is for testing, we can check them later when we know things are working. When you perform a query now, at the bottom of the page you should see a horizontal rule followed by Query Statistics followed by all the queries performed in calculating results. One of these should be PHRASE QUERY. Underneath it you should see Lookup Offset Times and beneath this Machine Subtimes: ID_0 and ID_1. If these appear you know its working.

When a query is typed into the name server it tacks no:network onto it and asks it of all the queue servers. It then merges the results. So if you type "hello" as the search, i.e., if you go to the url

 http://yioop_1_url/?q=hello

the somewhere/yioop1 script will make in parallel the curl requests

 http://yioop_1_url/?q=hello&network=false&raw=1 
    (raw=1 means no grouping)
 http://yioop_2_url/?q=hello&network=false&raw=1

get the results back, and merges them. Finally, it returns to the user the result. The network=false tells http://yioop_1_url/ to actually do the query lookup rather than make a network request.

Installation Guides

Demo Install Video

Install Yioop Without a Web Server

Windows

macOS

Ubuntu/Debian Linux

Install Yioop Under a Web Server

Wamp

macOS

Ubuntu/Debian Linux

CPanel

Systems with Multiple Queue Servers