src/libraryWikiParser.php

SeekQuarry/Yioop -- Open Source Pure PHP Search Engine, Crawler, and Indexer

Copyright (C) 2009 - 2020 Chris Pollett chris@pollett.org

LICENSE:

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

END LICENSE

Classes

WikiParser Class with methods to parse mediawiki documents, both within Yioop, and when Yioop indexes mediawiki dumps as from Wikipedia.

Functions

makeTableCallback()

makeTableCallback(array  $matches) 

Callback used by a preg_replace_callback in nextPage to make a table

Parameters

array $matches

of table cells

citeCallback()

citeCallback(array  $matches, integer  $init = -1) : string

Used to convert {{cite }} to a numbered link to a citation

Parameters

array $matches

from regular expression to check for {{cite }}

integer $init

used to initialize counter for citations

Returns

string —

a HTML link to citation in current document

fixLinksCallback()

fixLinksCallback(array  $matches) : string

Used to changes spaces to underscores in links generated from our earlier matching rules

Parameters

array $matches

from regular expression to check for links

Returns

string —

result of correcting link

base64EncodeCallback()

base64EncodeCallback(array  $matches) : string

Callback used to base64 encode the contents of nowiki tags so they won't be manipulated by wiki replacements.

Parameters

array $matches

$matches[1] should contain the contents of a nowiki tag

Returns

string —

base 64 encoded contents surrounded by an escaped nowiki tag.

spaceEncodeCallback()

spaceEncodeCallback(array  $matches) : string

Callback used to encode the contents of pre tags so they won't accidently get sub-pre tags because a bunch of leading lines have spaces

Parameters

array $matches

$matches[1] should contain the contents of a pre tag

Returns

string —

encoded contents surrounded by an escaped pre tag.

spanEncodeCallback()

spanEncodeCallback(array  $matches) : string

Callback used to encode the contents of span tags so they newlines within them don't accidentally get treated as new wiki paragraphs

Parameters

array $matches

$matches[1] should contain the contents of a span tag

Returns

string —

encoded contents surrounded by an escaped pre tag.

base64DecodeCallback()

base64DecodeCallback(array  $matches) : string

Callback used to base64 decode the contents of previously base64 encoded (@see base64EncodeCallback) nowiki tags after all mediawiki substitutions have been done

Parameters

array $matches

$matches[1] should contain the contents of a nowiki tag

Returns

string —

base 64 decoded contents surrounded by a pre-formatted tag.

spaceDecodeCallback()

spaceDecodeCallback(array  $matches) : string

Cleans up pre tags after other wiki rules applied

Parameters

array $matches

$matches[1] should contain the contents of a pre tag

Returns

string —

cleaned contents surrounded by a pre-formatted tag.